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ABSTRACT 

Context. The Gaia satellite will observe about one billion stars and other point-like sources. The astrometric core solution will 
determine the astrometric parameters (position, parallax, and proper motion) for a subset of these sources, using a global solution 
approach which must also include a large number of parameters for the satellite attitude and optical instrument. The accurate and 
efficient implementation of this solution is an extremely demanding task, but crucial for the outcome of the mission. 
Aims. We aim to provide a comprehensive overview of the mathematical and physical models applicable to this solution, as well as 
its numerical and algorithmic framework. 

Methods. The astrometric core solution is a simultaneous least-squares estimation of about half a billion parameters, including the 
astrometric parameters for some 100 million well-behaved so-called primary sources. The global nature of the solution requires an 
iterative approach, which can be broken down into a small number of distinct processing blocks (source, attitude, calibration and 
global updating) and auxiliary processes (including the frame rotator and selection of primary sources). We describe each of these 
processes in some detail, formulate the underlying models, from which the observation equations are derived, and outline the adopted 
numerical solution methods with due consideration of robustness and the structure of the resulting system of equations. Appendices 
provide brief introductions to some important mathematical tools (quaternions and B-splines for the attitude representation, and a 
modified Cholesky algorithm for positive semidefinite problems) and discuss some complications expected in the real mission data. 
Results. A complete software system called AGIS (Astrometric Global Iterative Solution) is being built according to the methods 
described in the paper. Based on simulated data for 2 million primary sources we present some initial results, demonstrating the 
basic mathematical and numerical validity of the approach and, by a reasonable extrapolation, its practical feasibility in terms of data 
management and computations for the real mission. 
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1. Introduction 

The space astrometry mission Gaia, planned to be launched by 
the European Space Agency (ESA) in 2013, will determine accu- 
rate astrometric data for about one billion objects in the magni- 
tude range from 6 to 20. Accuracies of 8-25 micro-arcsec (//as) 
are typically expected for the trigonometric parallaxes, positions 
at mean epoch, and annual proper motions of simple (i.e., ap- 
parently single) stars down to 15th magnitude. The astrometric 
data are complemented by photometric and spectroscopic infor- 
mation collected with dedicated instruments on board the Gaia 
satellite. The mission will result in an astronomical database 
of unprecedented scope, accuracy and completeness becoming 
available to the scientific community around 2021. 

The original interferometric concept for a successor mission 
to Hipparcos, called GAIA (Global Astrometric Interferometer 
for Astrophysics), was described by [Lindegren & Ferryman] 
( [199 6 ) but has since evolved considerably by the incorporation 
of novel ideas (H0g 2008) and as a result of industrial studies 
conducted under ESA contracts ( [Ferryman et al.|2001 j l. The mis- 



sion, now in the final integration phase with EADS Astrium as 
prime contractor, is no longer an interferometer but has retained 
the name Gaia, which is thus no acronym. For some brief but 
up-to-date overviews of the mission, see Lindegren et al. (2008 1 
and Lindegren (2010). The scientific case is most comprehen- 
sively described in the proceedings of the conference The Three- 
Dimensional Universe with Gaia ( |Turon et al.|2005| l. 

In parallel with the industrial development of the satellite, 
the Gaia Data Frocessing and Analysis Consortium (DFAC; 
[Mignard et al.|[2008| l is charged with the task of developing 
and running a complete data processing system for analysing 
the satellite data and constructing the resulting database ('Gaia 
Catalogue'). This task is extremely difficult due to the large 
quantities of data involved, the complex relationships between 
different kinds of data (astrometric, photometric, spectroscopic) 
as well as between data collected at different epochs, the need 
for complex yet efficient software systems, and the interaction 
and sustained support of many individuals and groups over an 
extended period of time. 
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A fundamental part of the data processing task is the astro- 
metric core solution, currently under development in DPAC's 
Coordination Unit 3 (CU3), 'Core Processing' . Mathematically, 
the astrometric core solution is a simultaneous determination of 
a very large number of unknowns representing three kinds of 
information: (i) the astrometric parameters for a subset of the 
observed stars, representing the astrometric reference frame; (ii) 
the instrument attitude, representing the accurate celestial point- 
ing of the instrument axes in that reference frame as a function of 
time; and (iii) the geometric instrument calibration, representing 
the mapping from pixels on the CCD detectors to angular direc- 
tions relative to the instrument axes. Although the astrometric 
core solution is only made for a subset of the stars, the result- 
ing celestial reference frame, attitude and instrument calibration 
are fundamental inputs for the processing of all observations. 
Optionally, a fourth kind of unknowns, the global parameters, 
may be introduced to describe for example a hypothetical devia- 
tion from General Relativity. 

We use the term 'source' to denote any astronomical object 
that Gaia detects and observes as a separate entity. The vast ma- 
jority of the Gaia sources are ordinary stars, many of them close 
binaries or the components of wide systems, but some are non- 
stellar (for example asteroids and quasars). Nearly everywhere 
in this paper, one can substitute 'star' for 'source' without dis- 
tortion; however, for consistency with established practice in the 
Gaia community we use 'source' throughout. 

While the total number of distinct sources that will be ob- 
served by Gaia is estimated to slightly more than one billion, 
only a subset of them shall be used in the astrometric core so- 
lution. This subset, known as the 'primary sources', is selected 
to be astrometric ally well-behaved (see Sect. |6.2| l and consists 
of (effectively) single stars and extragalactic sources (quasars 
and AGNs) that are sufficiently stable and point-like. We as- 
sume here that the number of primary sources is about 10^, i.e., 
roughly one tenth of the total number of objects, although in the 
end it is possible that an even larger number will be used. 

In comparison with many other parts of the Gaia data pro- 
cessing, the astrometric core solution is in principle simple, 
mainly because it only uses a subset of the observations (namely, 
those of the primary sources), which can be accurately modelled 
in a relatively straightforward way. In practice, the problem is 
however formidable: the total number of unknowns is of the or- 
der of 5 X 10*^, the solution uses some 10" individual observa- 
tions extracted from some 70 Terabyte (TB) of raw satellite data, 
and the entangled nature of the data excludes a direct solution. A 
feasible approach has nevertheless been found, including a pre- 
cise mathematical formulation, practical solution method, and 
efficient software implementation. It is the aim of this paper to 
provide a comprehensive overview of this approach. 

Concerning notations we have followed the usual conven- 
tion to denote all non-scalar entities (vectors, tensors, matrices, 
quaternions) by boldfaced characters. Lower-case bold italics 
(a) are used for vectors and one-dimensional column matrices; 
upper-case bold italics (A) usually denote two-dimensional ma- 
1983| l the prime ( ') signifies both ma- 



trices. Following Murray 



trix transpose («', A') and scalar multiplication of vectors; thus 
||fl|| - (a'ay^- defines the magnitude of the vector a as well as the 
Euclidean norm of the column matrix a. Angular brackets denote 
normalization to unit length, as in (x) = Ji:||jr|r'. In this notation, 
no special distinction is thus made between vectors as physical 
entities (also known as geometric, spatial or Euclidean vectors) 
on one hand, and their numerical representations in some coor- 
dinate system as column matrices (also known as list vectors) 
on the other hand. Moreover, list vectors can of course have any 



dimension: a e W. In the coordinate system whose axes are 
aligned with unit vectors x, y, and z, the components of the ar- 
bitrary vector a are given by - x'a, a,. = y'<^^ and a, = 
z'a; if the coordinate system is represented by the vector triad 
S = [jc J z], these components are given by the column matrix 
S'a (cf. Appendix A in Murray|1983[ l. This notation, although 
perhaps unfamiliar to many readers, provides a convenient and 
unambiguous framework for representing and transforming spa- 
tial vectors in different coordinate systems. For a vector- valued 
function f{x), df Idx' denotes the dim(/)xdim(j[:) matrix whose 
(/, y)th element is dfi/dxj. Quaternions follow their own algebra 
(see Appendix |A] for a brief introduction) and must not be con- 
fused with vectors/matrices; quaternions are therefore denoted 
by bold Roman characters (a). When taking a derivative with 
respect to the quaternion a, dx/da denotes the 4x1 matrix of 
derivatives with respect to the quaternion components; dx/da' 
is the transposed matrix. Tables of acronyms and variables are 
provided in Appendix [E] 



2. Outline of the approach 

The astrometric principles for Gaia were outlined already in the 
Hipparcos Catalogue (ESA IlQQTj Vol. 3, Ch. 23) where, based 
on the accumulated experience of the Hipparcos mission, a view 
was cast to the future. The general principle of a global astromet- 
ric data analysis was succinctly formulated as the minimization 
problem: 



min |ir'^-r'"^(*,«)||Ai, 



(1) 



with a slight change of notation to adapt to the present paper. 
Here s is the vector of unknowns (parameters) describing the 
barycentric motions of the ensemble of sources used in the as- 
trometric solution, and « is a vector of 'nuisance parameters' 
describing the instrument and other incidental factors which are 
not of direct interest for the astronomical problem but are never- 
theless required for realistic modelling of the data. The observa- 
tions are represented by the vector Z"*"* which could for example 
contain the measured detector coordinates of all the stellar im- 
ages at specific times. /'^^''^(s,n) is the observation model, e.g., 
the expected detector coordinates calculated as functions of the 
astrometric and nuisance parameters. The norm is calculated in 
a metric Ai defined by the statistics of the data; in practice the 
minimization will correspond to a weighted least-squares solu- 
tion with due consideration of robustness issues (see Sect. |3.6[ ). 

While Eq. ([T]l encapsulates the global approach to the data 
analysis problem, its actual implementation will of course de- 
pend strongly on the specific characteristics of the Gaia instru- 
ment and mission as well as on the practical constraints on the 
data processing task. 

At the heart of the problem is the modelling of data repre- 
sented by the function f^^^'^(s,n) in Eq. ([T]). This is schemati- 
cally represented in the shaded part of the diagram in Fig. [T] It 
shows the main steps for calculating the expected CCD output in 
terms of the various parameters. The data processing, effecting 
the minimization in Eq. ([TJ, is represented in the right part of 
the diagram. In subsequent sections we describe in some detail 
the main elements depicted in Fig. [T] The astrometric (source) 
parameters are represented by the vector s, while the nuisance 
parameters are of three kinds: the attitude parameters a, the ge- 
ometric calibration parameters c, and the global parameters g. 
The observables / are represented by the pixel coordinates (a-, fj) 
of point source images on Gala's CCDs. (In the actual imple- 
mentation of the approach, the minimization problem is formu- 
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Fig. 1. Schematic representation of the main elements of the astrometric data analysis. In the shaded area is the mathematical model 
that allows to calculate the position of the stellar image in pixel coordinates, and hence the expected CCD data, for any given set 
of model parameters. To the right are the processes that fit this model to the observed CCD data by adjusting the parameters in the 
rectangular boxes along the middle. This paper is primarily concerned with the geometrical part of the analysis contained in the 



dashed box. However, a brief outline of the CCD data modelling and processing (bottom part of the diagram) is given in Sect. 3.5 



and Appendix D.2 



lated in terms of the field angles 77, ^ rather than in the directly 
measured pixel coordinates k, fi; see Sects. 3.3 and 3.4 ) 



The various elements of the astrometric solution are de- 
scribed in some detail in the subsequent sections. Section|3]pro- 
vides the mathematical framework needed to model the Gaia 
observations and setting up the least-squares equations for the 
astrometric solution. In the interest of clarity and overview we 
omit in this description certain complications that need to be 
considered in the final data processing system; these are instead 
briefly discussed in Appendix |D] In Sect. |4] we describe the it- 
erative solution method in general terms, and then provide, in 
Sects. 5]and|6] the mathematical details of the most important 
procedures. Section|7]outlines the existing implementation of the 
solution and presents the results of a demonstration run based 
on simulated observations of about 2 million primary sources. 
Appendices [A| to [C| provide brief introductions to three mathe- 
matical tools that are particularly important for the subsequent 
development, namely the use of quaternions for representing the 
instantaneous satellite attitude, the B-spline formalism used to 
model the attitude as a function of time, and a modified Cholesky 
algorithm for the decomposition of positive semidefinite normal 
matrices. 



3. Mathematical formulation of the basic 
observation model 

3. 1 . Reference systems 

The high astrometric accuracy aimed for with Gaia makes it nec- 
essary to use general relativity for modelling the data. This im- 
plies a precise and consistent formulation of the different refer- 
ence systems used to describe the motion of the observer (Gaia), 
the motion of the observed object (source), the propagation of 
light from the source to the observer, and the transformations be- 
tween these systems. The formulation adopted for Gaia (KlionerJ 
2003] |20d4) l is based on the parametrized post-Newtonian (PPN) 
version of the relativistic framework adopted in 2000 by the 
International Astronomical Union (lAU); see |Soffel et al.| ( |2003] l. 
In this section only some key concepts from this formulation are 
introduced. 

The orbit of Gaia and the light propagation from the source 
to Gaia are entirely modelled in the Barycentric Celestial 
Reference System (BCRS) whose spatial axes are aligned with 
the International Celestial Reference System (ICRS, [Feissel &| 



Mignard|1998). The associated time coordinate is the barycen- 



tric coordinate time (TCB). Throughout this paper, all time vari- 
ables denoted f (with various subscripts) must be interpreted as 
TCB. The ephemerides of solar-system bodies (including the 
Sun and the Earth) are also expressed in this reference system. 
Even the motions of the stars and extragalactic objects are for- 
mally modelled in this system, although for practical reasons 
certain eff'ects of the light-propagation time are conventionally 
ignored in this model (Sect.[J!2]i. 
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In order to model the rotation (attitude) of Gaia and the ce- 
lestial direction of the light rays as observed by Gaia, it is ex- 
pedient to introduce also a co-moving celestial reference system 
having its origin at the centre of mass of the satellite and a coor- 
dinate time equal to the proper time at Gaia. This is known as the 
Centre-of-Mass Reference System (CoMRS) and the associated 
time coordinate is Gaia Time (rG)- |Klioner| ( |20(j4) l demonstrates 
how the CoMRS can be constructed in the lAU 2000 frame- 
work in exact analogy with the Geocentric Celestial Reference 
System (GCRS), only for a massless particle (Gaia) instead of 
the Earth. Like the BCRS and GCRS, the CoMRS is kinemati- 
cally non-rotating, and its spatial axes are aligned with the ICRS. 
The celestial reference system (either the CoMRS or the ICRS 
depending on whether the origin is at Gaia or the solar-system 
barycentre) will in the following be represented by the vector 
triad C = [Z F Z], where X, Y, and Z are orthogonal unit vectors 
aligned with the axes of the celestial reference system. That is, X 
points towards the origin [a - 6 - 0), Z towards the north celes- 
tial pole {6 = +90°), and F = Z X Z points to (a = 90°, 6 = 0). 

In the CoMRS the attitude of the satellite is a spatial ro- 
tation in three dimensions, and can therefore be described 



purely classically, for example using quaternions (Sect. 3.3 and 
Appendix [A]i. The rotated reference system, aligned with the 
instrument axes, is known as the Scanning Reference System 
(SRS). Its spatial x, y, z axes (Fig.[2]l are defined in terms of the 
two viewing directions of Gaia / p (in the centre of the preceding 
field of view, PFoV) and / p (in the centre of the following field 
of view, FFoV) as 



x^ih+h), z = </f X /p) , y^zxx. 



(2) 



(The precise definition of /p and /p is implicit in the geometric 
instrument model; see Sect. |3.4| ) The SRS is represented by the 
vector triad S = [a: 3? z] . 

For determining the orbit of Gaia and calibrating the on- 
board clock, it is also necessary to model the radio ranging and 
other ground-based observations of the Gaia spacecraft in the 
same relativistic framework. For this, we also need the GCRS. 
These aspects of the Gaia data processing are, however, not dis- 
cussed in this paper. 

3.2. Astrometric model 

The astrometric model is a recipe for calculating the proper di- 
rection M,(f) to a source (/) at an arbitrary time of observation 
(f) in terms of its astrometric parameters s, and various auxil- 
iary data, assumed to be known with sufficient accuracy. The 
auxiliary data include an accurate barycentric ephemeris of the 
Gaia satellite, bc{t), including its coordinate velocity dbc/dt, 
and ephemerides of all other relevant solar-system bodies. 

For the astrometric core solution every source is assumed to 
move with uniform space velocity relative to the solar-system 
barycentre. In the BCRS its spatial coordinates at time f, are 
therefore given by 



biih) = ft,(fep) + {t, - fep)V/ 



(3) 



where fgp is an arbitrary reference epoch and ft,(fep), v, define 
six kinematic parameters for the motion of the source. The six 
astrometric parameters in s, are merely a transformation of the 
kinematic parameters into an equivalent set better suited for the 
analysis of the observations. The six astrometric parameters are: 

a, the barycentric right ascension at the adopted reference epoch 

^ep 



<5, the barycentric declination at epoch fgp 
TUi the annual parallax at epoch fgp 

jiati = (do-, /df) cos 5, the proper motion in right ascension at 
epoch fep 

= d^i/df the proper motion in declination at epoch fep 
/in - VriTUi/Au the 'radial proper motion' at epoch fgp, i-c, the 
radial velocity of the source (v„) expressed in the same unit 
as the transverse components of proper motion (Au = astro- 
nomical unit). 



As explained in Sect. |3.1| the astrometric parameters refer to the 
ICRS and the time coordinate used is TCB. The reference epoch 
fep is preferably chosen to be near the mid-time of the mission 
in order to minimize statistical correlations between the position 
and proper motion parameters. 

The transformation between the kinematic and the astromet- 
ric parameters is non-trivial ( |Klioner||2003 '), mainly as a con- 
sequence of the practical need to neglect most of the light- 
propagation time t - tt between the emission of the light at the 
source (f») and its reception at Gaia (f). This interval is typically 
many years and its value, and rate of change (which depends on 
the radial velocity of the source), will in general not be known 
with sufficient accuracy to allow modelling of the motion of the 
source directly in terms of its kinematic parameters according 
to Eq. The proper motion components jUu,,,, fisi and radial 
velocity v„ correspond to the 'apparent' quantities discussed by 
|Klioner| ( [j503 , Sect. 8). 

The coordinate direction to the source at time f is calculated 
with the same 'standard model' as was used for the reduction 
of the Hipparcos observations ( |ESA|1997| Vol. 1, Sect. 1.2.8), 
namely 

M,(0 = <r,- + (fp - tep)ipifia*i + qifJ^si + ri^iri) - mibciO/A^) (4) 

where the angular brackets signify vector length normalization, 
and [pi qi r,] is the 'normal triad' of the source with respect 
to the ICRS ( |Murray||T983[ |. In this triad, r, is the barycentric 
coordinate direction to the source at time fep, pi = (Z X r,), and 
Qi = r, X Pi. The components of these unit vectors in the ICRS 
are given by the columns of the matrix 



0'[pi qi r,] 



- sin Oi - sin 6i cos a/ cos 6i cos ff,- 
cos Oi - sin 6i sin a; cos 5,- sin a; 
cos 5i sin <5, 



(5) 



bait) is the barycentric position of Gaia at the time of obser- 
vation, and Au the astronomical unit, fp is the barycentric time 
obtained by coiTecting the time of observation for the Romer 
delay; to sufficient accuracy it is given by 



fp =t + r]bG(t)/c, 



(6) 



where c is the speed of light. 

In Eq. (|4]) the term containing /i„ accounts for the relative 
rate of change of the barycentric distance to the source. This 
term may produce secular variations of the proper motions and 
parallaxes of some nearby stars, which in principle allow their 
radial velocities to be determined astrometrically (Dravins et al. 
[I999[ ). However, for the vast majority of these stars, /i„ can be 
more accurately calculated by combining the spectroscopically 
measured radial velocities with the astrometric parallaxes. Thus, 
although all six astrometric parameters are taken into account 
when computing the coordinate direction, usually only five of 
them are considered as unknowns in the astrometric solution. 

The standard model can be derived by considering the uni- 
form motion of the source in a purely classical framework, using 
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Fig. 2. The Scanning Reference System (SRS) [x y z] is de- 
fined by the viewing directions /p and /p according to Eq. 
The instrument angles (tf, are the spherical coordinates in the 
SRS of the observed (proper) direction u towards the object. See 
Fig. |4]for further definition of the viewing directions. The field 
sizes are greatly exaggerated in this drawing; in reality the astro- 
metrically useful field is 0.72° x 0.69° (along x across scan) for 
each viewing direction. The basic angle is Fc = arccos(/p'/p), 
nominally equal to 106.5°. 



Euclidean coordinates and neglecting the light propagation time 
from the source to the observer (except for the Romer delay). 
To the accuracy of Gaia, relativistic and light-propagation effects 
are by no means negligible, but it can be shown that this model is 
nevertheless accurate enough to model the observations to sub- 
microarcsec accuracy. It is adopted for this purpose because it 
closely corresponds to the conventional interpretation of the as- 
trometric parameters. However, when the astrometric parameters 
are to be interpreted in terms of the barycentric space velocity of 
the source, some of these effects may come into play ( [Lindegren 
|& Dravins|2003l ). 

The transformation from Ui(t) to the observable (proper) di- 
rection M,(f) involves taking into account gravitational light de- 
flection by solar-system bodies and the Lorentz transformation 
to the co-moving frame of the observer (stellar aberration); the 
relevant formulae are given by |Klioner| ( [2003 ) . This transfor- 
mation therefore depends also on the global parameters g, for 
example the PPN parameter y which measures the strength of 
the gravitational light deflection. The calculation uses some aux- 
iliary data, not subject to improvement by the solution, and 
which are here denoted h\ normally they include for example 
the barycentric ephemerides of the Gaia satellite and of solar- 
system bodies, along with their masses. The complete transfor- 
mation can therefore be written symbolically as; 



H,(f) = u{Si, g\t,h), 



(7) 



where the vertical bar formally separates the (updatable) param- 
eters Si and g from the (fixed) given data f and h. Strictly speak- 
ing, the function u returns the coordinates of the proper direction 
in the CoMRS, that is the column matrix C'H,(f)- The source pa- 
rameter vector s is the concatenation of the sub-vectors s, for all 
the primary sources. 

3.3. Attitude model 

The attitude specifies the instantaneous orientation of the Gaia 
instrument in the celestial reference frame, that is the transfor- 



mation from C - [X Y Z] (more precisely, the CoMRS) to 
S = [jc z] (Sect. |3.1| l as a function of time. The spacecraft is 
controlled to follow a specific attitude as a function of time, for 
example the so-called Nominal Scanning Law (NSL; de Bruijne] 
et al.|2010 1, designed to provide good coverage of the whole ce- 
lestial sphere while satisfying a number of other requirements. 
The NSL is analytically defined for arbitrarily long time inter- 
vals by just a few free parameters. 

However, the actual attitude will deviate from the intended 
(nominal) scanning law by up to ~ 1 arcmin in all three axes, 
and these deviations vary on time scales from seconds to min- 
utes depending on the level of physical perturbations and the 
characteristics of the real-time attitude measurements and con- 



trol law (cf. Appendix D.4i. The CCD integration time of (usu- 
ally) 4.42 s means that the true (physical) attitude cannot be ob- 
served, but only a smoothed version of it, corresponding to the 
convolution of the physical attitude with the CCD exposure func- 
tion (Appendix |D.3[ ). This 'effective' attitude must however be 
a posteriori estimable, at any instant, to an accuracy compatible 
with the astrometric goals, i.e., at sub-mas level. For this pur- 
pose the effective attitude is modelled in terms of a finite num- 
ber of attitude parameters, for which it is necessary to choose a 
suitable mathematical representations of the instantaneous trans- 
formation, as well as of its time dependence. For the Gaia data 
processing, we have chosen to use quaternions (AppendixjAjl for 
the former, and B-splines (Appendix [B|) for the latter represen- 
tation. 

At any time the orientation of the SRS (S) with respect to the 
CoMRS (C) may be represented by the attitude matrix 



A = 





Ax. 


AxA 




x'X 


x'Y 


x'Z 


^yx 


A,, 


A,, 


= S'C = 


y'X 


y'Y 


y'Z 


A,x 








z'X 


z'Y 


z'Z_ 



(8) 



which is a proper orthonormal matrix, A A' - I, det(A) = +1. 
This is also called the direction cosine matrix, since the rows 
are the direction cosines of x, y and z in the CoMRS, and the 
columns are the direction cosines of X, Y and Z in the SRS. 

The orthonormality of A implies that the matrix elements 
satisfy six constraints, leaving three degrees of freedom for the 
attitude representation. Although one could parametrize each 
of the nine matrix elements as a continuous function of time, 
for example using splines, it would in practice not be possi- 
ble to guarantee that the orthonormality constraints hold at any 
time. The adopted solution is to represent the instantaneous at- 
titude by a unit quaternion q, which has only four components, 
\qx, qy, q,, q^], satisfying the single constraint qj, + q^.+q^ + q^^ = 
1 . This is easily enforced by normalization. 

The attitude quaternion q(f) gives the rotation from the 
CoMRS (C) to the SRS (S). Using quaternions, their relation is 
defined by the transformation of the coordinates of the arbitrary 
vector V in the two reference systems, 

{S'v,0) =q"'{C'v,0)q. (9) 

In the terminology of Appendix |A.3| this is a frame rotation 
of the vector from C to S. The inverse operation is {C'v,0) = 
q{S'v,0)q-'. 

Using the B-sphne representation in Appendix B, we have 

l(t)^{ZL(-M^ia„B„(t)) , (10) 

where a„ (n = . . . - 1) are the coefficients of the B-splines 
B„(t) of order M defined on the knot sequence {T/t)f^(f The 
function B„(t) is non-zero only for t„ < f < t„+m, which is 
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why the sum in Eq. ( [T0| only extends over the M terms ending 
with n - {. Here, { is the so-called left index of t satisfying 
T( <t < Tf+i . The normalization operator ( ) guarantees that q(r) 
is a unit quaternion for any t in the interval [r^-i , t^] over which 
the spline is completely defined. Although the coefficients a„ are 
formally quaternions, they are not of unit length. The attitude 
parameter vector a consists of the components of the whole set 
of quaternions a„. 

Cubic splines (M - 4) are currently used in this attitude 
model. Each component of the quaternion (before the normal- 
ization in Eq. 10 1 is then a piecewise cubic polynomial with 
continuous value, first, and second derivative for any t; the third 
derivative is discontinuous at the knots. When initializing the 
solution, it is possible to select any desired order of the spline. 
Using a higher order, such as M = 5 (quartic) or 6 (quintic), 
provides improved smoothness but also makes the spline fitting 
less local, and therefore more prone to undesirable oscillatory 
behaviour. The flexibility of the spline is in principle only gov- 
erned by the number of degrees of freedom (that is, in practice, 
the knot frequency), and is therefore independent of the order 
One should therefore not choose a higher order than is warranted 
by the smoothness of the actual effective attitude, which is dif- 
ficult to model a priori (cf. Appendix D.4 1. Determining the op- 
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timal order and knot frequency may in the end only be possible 
through analysis of the real mission data. 

Equation ^ returns the observed direction to the source rel- 
ative to the celestial reference system, or Cm - {ux. My, uzY ■ In 
order to compute the position of the image in the field of view, 
we need to express this direction in the Scanning Reference 
System, SRS (Sect. 3.1 1, or S'm = [u^, Uy, u,]'. The required 
transformation is obtamed by a frame rotation according to 
Eq. 



{S'm, 0) = q"' {Cm, 0)q, 



(11) 
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COS ^ COS If 
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Uy 




COS ^ sin 










sin ^ 





whereupon the instrument angles {ip, are obtained from 

(fi — atan2(My, 

( - atan2(M., + u^.^ 
(12) 

(Fig. |2|, where we adopt the convention that -n < ip < n. The 
field index / and the along-scan field angle rj are finally obtained 
as 

/ = sign((^), T]^ip-fYJ2, (13) 

where the basic angle, Tc, is here a purely conventional value (as 
suggested by the subscript). The field-of-view limitations imply 
that \ri\ < 0.5° and \(\ < 0.5° for any observation. 

Given the time of observation f and the set of source param- 
eters s, attitude parameters a, and global parameters g, the field 
index / and the field angles (77, can thus be computed by appli- 
cation of Eqs. (j7]i, ([T0|-( 13 I. The resulting computed field angles 
are concisely written rjit \ s, a, g), ((t \ s, a, g). 



3.4. Geometric instrument model 

The geometric instrument model defines the precise layout of 
the CCDs in terms of the field angles (77, 0. This layout depends 
on several different factors, including: the physical geometry of 
each CCD; its position and alignment in the focal-plane assem- 
bly; the optical system including its scale value, distortion and 
other aberrations; and the adopted (conventional) value of the 
basic angle Fc. Some of these (notably the optical distortion) are 
different in the two fields of view and may evolve slightly over 
the mission; on the other hand these variations are expected to be 



- PFOV 

- FFOV 



Fig. 3. Layout of the CCDs in Gala's focal plane. The star im- 
ages move from right to left in the diagram. The along-scan (AL) 
and across-scan (AC) directions are indicated in the top left cor- 
ner The skymappers (SMI, SM2) provide source image detec- 
tion, two-dimensional position estimation and field-of-view dis- 
crimination. The astrometric field (AF1-AF9) provides accurate 
AL measurements and (sometimes) AC positions. Additional 
CCDs used in the blue and red photometers (BP, RP), the radial- 
velocity spectrometer (RVS), for wavefront sensing (WFS) and 
basic-angle monitoring (BAM) are not further described in this 
paper One of the rows (AF3) illustrates the system for labelling 
individual CCDs (AF3_1, etc). The nominal paths of two star 
images, one in the preceding field of view (PFoV) and one in the 
following field of view (FFoV) are indicated. For purely tech- 
nical reasons the origin of the field angles (rj, () corresponds to 
different physical locations on the CCDs in the two fields. The 
physical size of each CCD is 45 x 59 mm^. 



very smooth as a function of the field angles. Other factors, such 
as the detailed physical geometry of the pixel columns, may be 
extremely stable but possibly quite irregular on a small scale. As 
a result, the geometric calibration model must be able to accom- 
modate both smooth and irregular patterns evolving on different 
time scales in the two fields of view. 

In the following it should be kept in mind that the astrometric 
instrument of Gaia is optimised for one-dimensional measure- 
ments in the along-scan direction, i.e., of the field angle 77, while 
the requirements in the perpendicular direction {() are much re- 
laxed. This feature of Gaia (and Hipparcos) is a consequence 
of fundamental considerations derived from the requirements 
to determine a global reference frame and absolute parallaxes 
(Lindegren & Bastian 201 1 1. In principle the measurement accu- 
racy in the f direction, as well as the corresponding instrument 
modelling and calibration accuracies, may be relaxed by up to 
a factor given by the inverse angular size of the field of view 
( |Lindegren|2005| l, or almost two orders of magnitude for a field 
of 0.7°. In practice a ratio of the order of 10 may be achieved, in 
which case the across-scan measurement and calibration errors 
have a very marginal effect on the overall astrometric accuracy. 

The focal plane of Gaia contains a total of 106 CCDs 



(Laborie et al. 2007j ), of which only the 14 CCDs in the skymap- 
pers (SMI and SM2) and the 62 in the astrometric field (AFl 
through AF9) are used for the astrometric solution (Fig. [3]). The 
CCDs have 4500 pixel lines in the along-scan (AL, constant 
decreasing rf) direction and 1966 pixel columns in the across- 
scan (AC, constant 77, increasing direction. They are operated 
in the Time, Delay and Integrate (TDI) mode (also known as 
drift-scanning), an imaging technique well-known in astronomy 
from ground-based programmes such as the Sloan Digital Sky 
Survey ( Gunn et al.||1998 1. Effectively, the charges are clocked 
along the CCD columns at the same (average) speed as the mo- 
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tion of the optical images, i.e., 60 arcsec s ' for Gaia. The expo- 
sure (integration) time is thus set by the time it takes the image 
to move across the CCD, or nominally T ^ 4.42 s, if no gate is 
activated. At this exposure time the central pixels will be satu- 
rated for sources brighter than magnitude G - 12|^Gate activa- 
tion, or 'gating' for short, is the adopted method to obtain valid 
measurements of brighter sources. Gating temporarily inhibits 
charge transfer across a certain TDI line (row of pixels AC), 
thus effectively zeroing the charge image and reducing the ex- 
posure time in proportion to the number of TDI lines following 
the gate. A range of discrete exposure times is thus available, the 
shortest one, according to current planning, using only 16 TDI 
lines (15.7 ms). Gate selection is made by the on-board soft- 
ware based on the measured brightness of the source in relation 
to the calibrated full-well capacity of the relevant section of the 
CCD. Measurement errors and the spread in full-well capacities 
make the gate-activation thresholds somewhat fuzzy, and a given 
bright source is not necessarily always observed using the same 
gate. Moreover, a quasi-random selection of the observations of 
fainter sources will also be gated, viz., if they happen to be read 
out at about the same time, and on the same CCD, as a gated 
bright star. The skymappers are operated in a permanently gated 
mode, so that in practice only the last 2900 TDI lines are used in 
SMI and SM2. 

The skymappers are crucial for the real-time operation of 
the instrument, since they detect the sources as they enter the 
field of view, and allow determination of an approximate two- 
dimensional position of the images and (together with data 
from AFl) their speed across the CCDs. This information is 
used by the on-board attitude computer, in order to determine 
which CCD pixels should be read out and transmitted to ground 
(Sect. 3.5 I. The skymappers also allow to discriminate between 
the two viewing directions, since sources in the preceding field 
of view (PFoV) are only seen by SMI and sources in the follow- 
ing field of view (FFoV) are only seen by SM2. In the astromet- 
ric field (as well as in BP, RP and RVS) the two fields of view 
are superposed. 

Most observations acquired in the astrometric field (AF) 
are purely one-dimensional through the on-chip AC binning 



of data (Sect. 3.5 1. However, sources brighter than magnitude 
G - 13 are always observed as two-dimensional images, pro- 
viding accurate information about the AC pixel coordinate (//) as 
well. Some randomly selected fainter images are also observed 
two-dimensionally ('Calibration Faint Stars', CFS). The bright 
sources, CFS and SM data provide the AC information neces- 
sary for the on-ground three-axis attitude determination. 

Because of the TDI mode of observation, AL irregularities of 
the pixel geometry are smeared out and need not be calibrated, 
and any measurement of the AL or AC position must effectively 
be referred back to an 'observation time' half-way through the 
integration. Correspondingly, the pixel geometry can be repre- 
sented by a fiducial 'observation line' {riifj), ^(yu)] traced out in 
the field angles rj, ( as functions of the AC pixel coordinate jj. 
(Fig. |4]). The AC pixel coordinate is defined as a continuous 
variable with fi - \A when the image is centred on the first 
pixel column, ji - 15 at the second pixel column, and so on 
until ju - 1979 at the last (1966th) pixel column. The offset by 
13 pixels allows for the presence of pre-scan pixel data in some 
observations. 




observation line 

nominal position of CCD centre 
CCD outline 



apparent 
path of 
an object 



TJ2 TJ2 

Fig. 4. Schematic illustration of how the field angles (77, are 
defined in terms of the CCD layout in Fig. [3] For simplicity 
only the projections of two CCDs, AF8_3 and AF8_4, into the 
Scanning Reference System (SRS) are shown (not to scale). The 
field angles have their origins at the respective viewing direction 
/p or /f (Fig. |2|, which are defined in relation to the nominal 
centres of the CCDs (crosses); the actual configuration of the 
detectors is described by fiducial observation lines according to 
Eq. ( 15 I. The dashed curve shows the apparent path of a stel- 



lar image across the two fields of view. Its intersection with the 
observation fines define the instants of observations. 



Nominally, the observation line corresponds to the {KI2)t\\ 
pixel line projected backwards through the optical instrument 
onto the Scanning Reference System on the sky, where K is the 
number of active AL pixel lines in the observation (normally 
K = 4500)0 For a gated observation K is much smaller and 
the observation line is therefore correspondingly displaced to- 
wards the CCD readout register A separate set of geometrical 
calibration parameters is therefore needed for each gate being 
used. In the calibration updating (Sect. |5.3| l all the calibration 
parameters are however solved together, with the overlap due 
to the fuzzy gate-activation thresholds providing the necessary 
connection between the different gates. 

Let n be an index identifying the different CCDs used for as- 
trometry, i.e., for each of the 76 CCDs in the SM and AF part of 
the focal plane. Furthermore, let g be a gate index such that g - Q 
is used for non-gated observations and g - 1, . . . , 12 for gated 
observations of progressively brighter sources. In each field of 
view (index /) the nominal observation lines |^77'^^j^,(//), ^"j^(yu)j 
could in principle be calculated from the nominal characteristics 
of the focal plane assembly (FPA) and ray tracing through the 
nominal optical system. However, since the nominal observation 
lines are only used as a reference for the calibration of the actual 
observation lines, a very simplistic transformation from linear 
coordinates to angles can be used without introducing approx- 
imation errors in the resulting calibration model. The nominal 
observation lines are therefore defined by the transformation 



ri%Ui)^tg--yvvK[n,g]IF I 
4^(/i) = - (Xfpa[«] - 0" - lic)p AC - ^Fprm) IP j 



(14) 



' G is the broad-band magnitude measured on Gala's unfiltered 
CCDs. G - V for unreddened early-type stars, while G - V - 2 for 



M stars. See Jordi et al. ( 2010b for a precise characterization of G. 



where Xfpa, J'fpa are physical coordinates (in mm) in the fo- 
cal planeplXFPA[n] is the physical AC coordinate of the nom- 

- In reality the definition of the fiducial observation hne is a bit more 
complex, as some of the pixel lines are blocked out by an aluminium 
mask. 

^ For consistency with notations adopted by the ESA project team 
and the industrial contractor, and extensively used, e.g., for on-ground 
calibrations, +Xfpa is oriented along -<f and +yFPA along —rj. 
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inal centre of the nth CCD, Ypp\[n,g] the physical AL coor- 
dinate of the nominal observation line for gate g on the «th 
CCD, and Xpp'^'^'=[/] the physical AC coordinate of the nomi- 
nal field centre for field index /. fi^ - 996.5 is the AC pixel 
coordinate of the CCD centre, p^c = 30 fj.m the physical AC 
pixel size, and F = 35 m the nominal equivalent focal length 
of the telescope. While T/Jj^ is independent of /, in the AC di- 
rection the origins are offset by about ±221 arcsec between the 
two fields of view, corresponding to the physical coordinates 
X^lTif] = (-37.5 mm)/. 

It is emphasized that the nominal observation lines are purely 
conventional reference quantities, and need not be recomputed, 
e.g., once a more accurate estimate of F becomes available. 

Because of possible changes in the instrument during the 
mission, the actual observation lines will be functions of time. 
The time dependence is quantified by introducing independent 
sets of calibration parameters for successive, non-overlapping 
time intervals. Different groups of parameters may develop on 
different time scales, and the resulting formulation can be quite 
complex. For the sake of illustration, let us distinguish between 
three categories of geometric calibration parameters: 

1 . Large-scale AL calibration, A77. This may (minutely but im- 
portantly) change due to thermal variations in the optics, the 
detectors, and their supporting mechanical structure. These 
variations could occur on short time scales (of the order of 
a day), and would in general be different in the two fields 
of view. They are modelled as low-order polynomials in the 
across-scan pixel coordinate fi. 

2. Small-scale AL calibration, drj. This is mainly related to 
physical defects or irregularities in the CCDs themselves, for 
example 'stitching errors' introduced by the photolithogra- 
phy process used to manufacture the CCDs. These irregular- 
ities are expected to be stable over very long time scales, pos- 
sibly throughout the mission, and should be identical in both 
fields of view. They require a spatially detailed modelling, 
perhaps with a resolution of just one or a few AC pixels. 

3. Large-scale AC calibration, A^. Although the physical origin 
is the same as for Arj, the AC components can be assumed 
to be constant on long time scales, since the calibration re- 
quirement in the AC direction is much more relaxed than in 
the AL direction. They are modelled as low-order polynomi- 
als in the field angles, separately in each field of view. 

Let index j identify the 'short' time intervals needed for the 
large-scale AL calibration, and index k identify the 'long' time 
intervals applicable to the small-scale AL calibration and large- 
scale AC calibration. That is, an observation at time t belongs to 
some short time interval j and some long time interval k, where 
j and k are readily computed from the known t and the corre- 
sponding sequences of division times|^Assuming that quadratic 
polynomials in jj are sufficient for the large-scale calibrations, 
and that full AC pixel resolution is required for the small-scale 
AL calibration, the observation lines at time t are modelled as 



2 



.//^-13.5 



r=0 



1966 
1966 



(15) 



Technically, the use of independent parameter values in successive 
time intervals represents a spline of order 1 (i.e., degree 0), the sepa- 
ration times constitute the knot sequence, and j or k correspond to the 
'left index' (Appendix|B.2|. 



where L*(x) is the shifted Legendre polynomial of degree r (or- 
thogonal on [0,1]), i.e., L*(jt:) = 1, L*(jc) = 2;c - 1, L^ix) = 
6x^ - 6x -H 1, etc; Arjrfngj are the large-scale AL parameters, 
Srjngkm the small-scale AL parameters (with m - round(/i) the 
index of the nearest pixel column), and A^^jngk the large-scale 
AC parameters. 

In order to render all the geometric calibration parameters 
uniquely determinable, a number of constraints are rigorously 
enforced by the astrometric solution. Effectively, they define the 
origins of the field angles, i.e., the viewing directions /p and /p, 
to coincide with the average nominal field angles of the CCD 
centres. The necessary constraints are: 



f neAP 



(16) 



for each combination rnA:, (17) 



^ A^o/«o/t = for each combination fk, (18) 



Note that g - throughout in Eq. ( 16 1-( 18 1. That the constraints 



are only defined in terms of the non-gated observations (g = 0) 
implies that the observation lines for the gated observations must 
be calibrated relative to the non-gated observations. This is pos- 
sible since any given bright source will not always be observed 
with the same gate. 

Constraint ( 16 1 effectively defines the zero point of the AL 
field angle 77 by requiring that A77 = when averaging over the 
CCDs and between the two fields of view. Constraint ^TJ) im- 
plies that the small-scale AL corrections 61] do not have any com- 
ponents that could be described by A?; instead; it therefore en- 
sures a unique division between the large-scale and small-scale 
components. Constraint ( [18] ) effectively defines the zero point of 
the AC field angle ( by requiring that A^ = 0, separately in each 
field of view, when averaged over the CCDs. The sums over n 
in Eqs. ( 16 1 and (fTSjl are restricted to the CCDs in the astromet- 



ric field (AF), since the skymapper (SM) measurements are less 
accurate. 



The basic angle introduced in Sect. 3.3 is a fixed reference 



value, and any real variations of the angle between the two lines 
of sight will therefore show up as a variation in A77 with opposite 
signs in the two fields of view. The offset of the actual basic 
angle with respect to the conventional value Fc may be defined 
as the average difference of Arj between the two fields of view, 
where the average is computed over the astrometric CCDs, for 
gate g = 0, and over yu; the result for time period j is 



(19) 



neAF / 



Although this is obviously a useful quantity to monitor, it does 
not appear as a parameter in the geometric calibration model. 
A number of additional quantities representing the mean scale 
offset, the mean field orientation, etc., can similarly be computed 
from the large-scale calibration parameters for the purpose of 
monitoring. 

Equation ( [T5] l encapsulates a specific formulation of the ge- 
ometric instrument model, with certain assumptions about the 
shape, dependencies, and time scales of possible variations. 
While this particular model is currently believed to be sufficient 
to describe the behaviour of the actual instrument to the required 
accuracy, it is very likely that modifications will be needed af- 
ter a first analysis of the flight data. Moreover, in the course of 
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the data analysis one may want to try out alternative models, or 
examine possible systematics resulting from the pre-processing 
(location estimator). In order to facilitate this, a much more flex- 
ible generic calibration model has been implemented. In the 
generic model, the 'observed' field angles (representing the true 
observation lines) for any observation I are written 



„obs 



/•={) 



(20) 



where each of the £,(1) (for brevity dropping superscript 
AL/AC) represents a basic calibration effect, being a linear com- 
bination of elemental calibration functions Or,: 



K,-l 



ErU) = 2 C„<D„(0 . 



(21) 



A^AL and Nac are the number of effects considered along and 
across scan. The whole set of coeflicients c^., forms the calibra- 
tion vector c. 

In the generic formulation, the multiple indices /, n, g and 
the variables fj. and t are replaced by the single observation in- 
dex /. This allows maximum flexibility in terms of how the cal- 
ibration model is implemented in the software. The functions 
<!)„ receive the observation index I, and it is assumed that this 
index suffices to derive from it all quantities needed to evalu- 
ate the calibration functions for this observation. Examples of 
quantities that can be derived from the observation index are: 
the FoV index /, the CCD and gate indices n and g, the AC 
pixel coordinate fi, time f, and any relevant astrometric, pho- 
tometric or spectroscopic parameter of the source (magnitude, 
colour index, etc.). Intrinsically real-valued quantities such as 
t can be subdivided into non-overlapping intervals with differ- 
ent sets of calibration parameters applicable to each interval. 
The basic calibration model ( 15 1 can therefore be implemented 



as a particular instance of the generic model, with for example 
L*ix) (r = 0, 1,2) constituting three of the calibration functions 
(with fi derived from /). Once the calibration functions have been 
coded, the entire calibration model can be conveniently specified 
(and changed) through an external configuration file alone using, 
e.g., XML structures. 

Some elemental calibration functions may be introduced for 
diagnostic purposes rather than actual calibration. An example 
of this is any function depending on the colour or magnitude 
of the source. The origin of such effects is briefly explained in 
Appendix |D . 1 1 and D.2 Magnitude- and colour-dependent vari- 
ations of the instrument response are expected to be fully taken 
into account by the signal modelling on the CCD data level, as 
outlined in Sect. 3.5 (notably by the LSF and CDM calibrations). 



and should not show up in the astrometric solution. Thus, non- 
zero results for such 'non-geometric' diagnostic calibration pa- 
rameters indicate that the signal modelling should be improved. 
In the final solution the diagnostic calibration parameters should 
ideally be zero. 

The parameters of the generic calibration model must satisfy 



a number of constraints similar to Eqs. ( 16 1-( 18 1. These can be 
cast in the general form 

C'c = 0, (22) 

where the matrix C contains one column, with known coeffi- 
cients, for each constraint. The columns are, by design, linearly 
independent. 



3.5. Signal (CCD data) model 

As suggested in Fig.[T] the modelling of CCD data at the level of 
individual pixels (i.e., the photon counts) is not part of the geo- 
metrical model of the observations with which we are concerned 
in this paper However, the processing of the photon counts, ef- 
fectively by fitting the CCD data model, produces the 'observa- 
tions' that are the input to the astrometric core solution. In order 
to clarify the exact meaning of these observations we include 
here a brief overview of the signal model. 

The pixel size, 10 fim ^ 59 mas in the along-scan (AL) 
direction and 30 fim ^ 177 mas in the across-can (AC) direc- 
tion, roughly matches the theoretical diffraction image for the 
1.45 X 0.50 m^ telescope pupil of Gaia (effective wavelength 
~ 650 nm). Around each detected object, only a small rectangu- 
lar window (typically 6-18 pixels long in the AL direction and 
12 pixels wide in the AC direction) is actually read out and trans- 
mitted to the ground. Moreover, for most of the observations in 
the astrometric field (AF), on-chip binning in the serial register 
is used to sum the charges over the window in the AC direction. 
This effectively results in a one-dimensional image of 6-18 AL 
'samples', where the signal (Nk) in each sample k is the sum of 
12 AC pixels. The exact time when a sample was transferred 
to the serial register, expressed on the TCB scale, is in princi- 
ple known from the time correlation of the on-board clock with 
ground-based time signals. Because of the known one-to-one re- 
lation between the TDI period counter k and t^, we may use k as 
a proxy for tt in subsequent calculations. 

For single stars, and in the absence of the effects discussed in 



Appendix D.2 the sample values in the window are modelled as 
a stochastic variable (Poisson photon noise plus electronic read- 
out noise) with expected value 



Ak =E(Nk)=/3 + aL{k-K) 



(23) 



where /?, a and k are the so-called image parameters represent- 
ing the (uniform) background level, the amplitude (or flux) of 
the source, and the AL location (pixel coordinate) of the image 
centroid. The continuous, non-negative function L{x) is the Line 
Spread Function (LSF) appropriate for the observation. L(x) de- 
pends, for example, on the spectrum of the source and on the 
position of the image in the focal plane. The (in general non- 
integer) pixel coordinate k is expressed on the same scale as 
the (integer) TDI index k, and may be translated to the equiv- 
alent TCB t{K) by means of the known relation between k and 
tk- The CCD observation time is defined as f(/c - K/2), where 
K is the number of TDI periods used for integrating the image 
(see Appendix |D.3[ ). Formally, the CCD observation time is the 
instant of time at which the centre of the source image passed 
across the CCD fiducial line halfway between the first and the 
last TDI line used in the integration (this will depend on the gat- 
ing). 

Fitting the CCD signal model to the one-dimensional sam- 
ple values Nk thus gives, as the end result of observation I, an 
estimate of the AL pixel coordinate k of the image in the pixel 
stream, which is then transformed to the observation time f;. The 
fitting procedure also provides an estimate of the uncertainty in 
K, which can be expressed in angular measure as a formal stan- 
dard deviation of the AL measurement, crf^. It is derived by 
error propagation through the fitted signal model, taking into 
account the dominant noise sources, photon noise and readout 
noise. 

For some observations, AC information is also provided 
through two-dimensional sampling of the pixel window around 
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the detected object. This applies to all SM observations, AF ob- 
servations of bright (G < 13) sources, and AF observations 



of Calibration Faint Stars (Sect. 3.4 1. The modelling of the 
two-dimensional images follows the same principles as outlined 
above for the one-dimensional (AL only) case, only that the LSF 
is replaced by a two-dimensional Point Spread Function (PSF) 
and that there is one more location parameter to estimate, namely 
the AC pixel coordinate jj.. The astrometric result in this case 
consists of the observation time f/, the observed AC coordinate 
fii, and their formal standard uncertainties crf^ and cr^*^ (both 
expressed as angles). 

The estimation errors for different images are, for the sub- 
sequent analysis, assumed to be statistically independent (and 
therefore uncorrected). This is a very good approximation to the 
extent that they only depend on the photon and readout noises. 
However, modelling errors at the various stages of the processing 
(in particular CTl effects in the signal modelling [Appendix|D.3| 



and attitude irregularities [Appendix D.4|) are likely to intro- 
duce errors that are correlated at least over the nine AF observa- 
tions in a field transit. The resulting correlations as such are not 
taken into account in the astrometric solution (i.e., the weight 
matrix of the least-squares equations is taken to be diagonal), 
but the sizes of the modelling errors are estimated, and are em- 
ployed to reduce the statistical weights of the observations as 
described in Sect. 3.6 The AL and AC estimates of a given (two- 
dimensional) image are roughly independent at least in the limit 
of small optical aberrations. 

3.6. Synthesis model 

By synthesis of the models described in the preceding sections, 
we are now in a position to formulate very precisely the core 
astrometric data analysis problem as outlined in Sect. |2] The 
unknowns are represented by the vectors s, a, c, and g of re- 
spectively the source, attitude, calibration, and global parame- 
ters. For any observation I the observed quantities are the ob- 
servation time f/ and, where applicable, the observed AC pixel 
coordinate with their formal uncertainties crf^, cr^'^. We then 
have the global minimization problem 



/•dAL\2 



(e^)2 



/nAC\2 ,„AC 



(24) 



where 



Rf-^is, a, c, g) = 7]f„g(Mi, ti\c)- j](ti I s, a, g) , (25) 



Rf-^is, a, c, g) = (fngi/Ji, f; I c) - ati I a, g) 



(26) 



are the residuals in the field angles, taken as functions of the un- 
known^ and I e AL refers to observations with a valid AL com- 
ponent, etc. The applicable indices /, n, g are of course known 



for every observation /. In Eq. (|24|), e^^ and ef"'^ represent all AL 



and AC error sources extraneous to the formal uncertainties; they 
include in particular modelling errors in the source behaviour 
(e.g., for unrecognized binaries), attitude and calibration, which 
have to be estimated as functions of time and source in the course 
of the data analysis process, wf^^ and wf^ are weight factors, al- 
ways in the range to 1 ; for most observations they are equal to 
1, but 'bad' data (outliers) are assigned smaller weight factors. 
The determination of these factors is discussed in Sect. 15.1.21 



^ Note that in Eq. \25\ the quantity yU; is just a given value (observed 
or computed); in the case of one-dimensional images the observed yi/ is 
replaced by an approximate value derived from current knowledge on 
the source and attitude. 



For the sake of conciseness we shall hereafter consider the 
AL and AC components of an observation to have separate ob- 
servation indices /, so that for example Ri stands for either R^^ 



or R^'^, as the case may be. This allows the two sums in Eq. (124 1 
to be contracted and written concisely as 



Q(s,a,c,g) = ^ 



R^ w, 



(27) 



where Wi - wi/(aj + ef) is the statistical weight of the observa- 



tion. 



The excess noise e/ represents modelling errors and should 
ideally be zero. However, it is unavoidable that some sources do 
not behave exactly according to the adopted astrometric model 
(Sect. \3.2\, or that the attitude sometimes cannot be represented 



by the spline model in Sect. 3.3 to sufficient accuracy. The ex- 
cess noise term in Eq. ( pT] ) is introduced to allow these cases to 
be handled in a reasonable way, i.e., by effectively reducing the 
statistical weight of the observations affected. It should be noted 
that these modelling errors are assumed to affect all the observa- 
tions of a particular star, or all the observations in a given time 
interval. (By contrast, the downweighting factor wi is intended 
to take care of isolated outliers, for example due to a cosmic-ray 
hit in one of the CCD samples.) This is reflected in the way the 
excess noise is modelled as the sum of two components. 



-er + e„(f/), 



(28) 



where e,- is the excess noise associated with source / (if I e 
that is, I is an observation of source ;), and ea(t) is the excess 
attitude noise, being a function of time. For a 'good' primary 
source, we should have e, = 0, and for a 'good' stretch of attitude 
data we may have ea(0 = 0. Calibration modelling errors are not 
explicitly introduced in Eq. ( 28 1, but could be regarded as a more 



or less constant pait of the excess attitude noise. The estimation 
of e, is described in Sect. |5.1.2[ and the estimation of eait) in 
Sect. l5T5] 

Three separate functions are needed to describe the excess 
attitude noise, coiTesponding to AL observations, AC observa- 
tions in the preceding field of view (ACP), and AC observations 
in the following field of view (ACF). We distinguish between 
these functions by letting the subscript a in e^it) stand for either 
AL, ACP or ACF 



4. Solving the global minimization problem 

Assuming 10** primary sources, the number of unknowns in the 
global minimization problem, Eq. (24i, is about 5 x 10** for the 
sources (s), 4x 10^ for the attitud e (a, a ssuming a knot interval of 



15 s for the 5 yr mission; cf Sect. 5.2. 1 1, 10 for the calibration c. 



and less than 100 global parameters (g). The number of elemen- 
tary observations (/) considered is about 8 x 10"^. However, the 
size of the data set, and the large number of parameters, would 
not by themselves be a problem if the observations, or sources, 
could be processed sequentially. The difficulty is caused by the 
strong connectivity among the observations: each source is ef- 
fectively observed relative to a large number of other sources si- 
multaneously in the field of view, or in the complementary field 
of view some 106.5° away on the sky, linked together by the at- 
titude and calibration models. The complexity of the astrometric 
solution in terms of the connectivity between the sources pro- 
vided by the attitude modelling was analysed by Bombrun et al. 
( |2010| l, who concluded that a direct solution is infeasible, by 
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many orders of magnitude, with today's computational capabil- 
ities. The study neglected the additional connectivity due to the 
calibration model, which makes the problem even more unreal- 
istic to attack by a direct method. Note that this impossibility is 
not a defect, but a virtue of the mathematical system under con- 
sideration: it guarantees that a unique, coherent and completely 
independent global solution for the whole sky can be derived 
from the system. 

The natural alternative to a direct solution is to use some 
iterative method. This is in fact the standard way to deal with 
large, sparse systems of equations. The literature in the field is 
vast and a plethora of methods exist for various kinds of appli- 
cations. However, the iterative method adopted for Gaia did not 
spring from a box of ready-made algorithms. Rather, it was de- 
veloped over several years in parallel with the software system 
in which it could be implemented and tested. Originally based 
on an intuitive and somewhat simplistic approach, the algorithm 
has developed through a series of experiments, insights and im- 
provements into a rigorous, efficient and well-understood pro- 
cedure, completely adapted to its particular application. In this 
section we first describe the approach in broad outline, then pro- 
vide the mathematical background for its better understanding 
and further development. 

4.1. Outline of the iterative solution 

The numerical approach to the Gaia astrometry is a block- 
iterative least-squares solution, referred to as the Astrometric 
Global Iterative Solution (AGIS). In its simplest form, four 
blocks are evaluated in a cyclic sequence until convergence. The 
blocks map to the four different kinds of unknowns outlined in 
Sect.|3] namely: 

S: the source (star) update, in which the astrometric parameters 

s of the primary sources are improved; 
A: the attitude update, in which the attitude parameters a are 

improved; 

C: the calibration update, in which the calibration parameters c 
are improved; 

G: the global update, in which the global parameters g are im- 
proved. 

The G block is optional, and will perhaps only be used in some 
of the final solutions, since the global parameters can normally 
be assumed to be known a priori to high accuracy. 

The blocks must be iterated because each one of them needs 
data from the three other processes. For example, when com- 
puting the astrometric parameters in the S block, the attitude, 
calibration and global parameters are taken from the previous it- 
eration. The resulting (updated) astrometric parameters are used 
the next time the A block is run, and so on. This iterative ap- 
proach to the astrometric solution was proposed early on in the 
Hipparcos project as an alternative to the 'three-step method' 
subsequently adopted for the original Hipparcos reductions; see 
ESA| ( [T997l Vol. 3, p. 488) and references therei n. Indeed, the 
later re-reduction of the Hipparcos raw data by |van Leeuwen| 
pOOTj lused a very similar iterative method, and yielded signifi- 
cantly improved results mainly by virtue of the much more elab- 
orate attitude modelling implemented as part of the approach. 

While the block-iterative solution as outlined above is in- 
tuitive and appealing in its simplicity, it is from a mathemati- 
cal standpoint not obvious that it must converge; and if it does 
indeed converge, it is not obvious how many iterations are re- 
quired, whether the order of the blocks in each iteration matter. 



and whether the converged results do in fact constitute a solu- 
tion to the global minimization problem. These are fundamen- 
tal questions for the validity of the adopted iterative approach, 
and we therefore take some care in the following subsections to 
establish its theoretical foundations (Sects. |4.2f|4.5| l. Section [5] 
then describes each of the S, A, C and G blocks in some de- 
tail. In addition to these blocks, separate processes are required 
for the alignment of the astrometric solution with the ICRS, the 
selection of primary sources, and the calculation of standard un- 
certainties; these auxiliary processes are discussed in Sect. [6] 

4.2. Least-squares approach 



Strictly speaking, Eq. ( [24] i is not a least-squares problem, be- 
cause of the weight factors wf^, wf" (as well as the excess 



noises , ef" ), which depend on the AL and AC residuals and 
hence on the unknowns (s, a, c, g). In Eq. ([T]) this dependence is 
formally included in the unspecified metric Ai, which therefore 
is not simply a (weighted) Euclidean norm. 



In principle, the minimization problem (24i can be solved 
by finding a point where the partial derivatives of the objective 
function Q with respect to all the unknowns are simultaneously 
zero. In practice, however, the partial derivatives are not com- 
puted completely rigorously, and the problem solved is therefore 
a slightly different one from what is outlined above. In order to 
understand precisely the approximations involved, it is neces- 
sary to consider how different kinds of non-linearities enter the 
problem. 

The functions t] f„g(jj, t \ c) and ^ f„g(fj., t \ c) appearing in 
Eqs. (25i-(26i are strictly linear in the calibration parameters 
c, by virtue of the basic geometric calibration model in Eq. ( 15 1 
or the generic model in Eqs. (20i-(21 1. On the other hand, the 



functions ri(t \ s, a, g) and ^(f | s, a, g) are non-linear in s, a, and 
g due to the complex transformations involved (Sects. 3.2 3.3 i. 
However, thanks to the data processing prior to the astromet- 
ric solution, the initial errors in these parameters are already so 
small that the corresponding errors in tj and ^ are only some 
0.1 arcsec (~ 10"^ rad). Second-order terms are therefore typi- 
cally less than 10"'^ rad ^ 0.2 yuas, that is negligible in compar- 
ison with the noise of a single AL observation (some 100 yuas). 
This means that the partial derivatives of the residuals Rf^ and 
Rf- with respect to all the unknowns do not change in the course 
of the solution. (In practice they are in fact recomputed in each 
iteration, although that is mainly done because it is more con- 
venient than to store and retrieve the values; nevertheless, this 
takes care of any remaining non-linearity, however small.) The 
non-linearities of the underlying astrometric, attitude, calibra- 
tion and global models are therefore not an issue for the mini- 
mization problem as such. 

The weight factors wi represent a different kind of non- 
linearity, potentially much more important for the final solution. 
These factors are introduced to make the solution robust against 
outliers, by reducing their influence on Q and hence on the es- 
timated parameter values (Sect. |5.1.2| l. Ideally, outlying obser- 
vations should not contribute at all to the solution (by having 
wi - 0), while 'normal' observations should receive full weight 
{wi - 1). In reality there will however be a transition region 
where the weight factors are between and 1 . Since the weight 
factors are in practice determined by the normalized residuals, 
Ri = Riicr^ + efy^^^, which in turn depend on the parameter 
values (s, a, c, g), it follows that the partial derivatives contain 
extra terms of the form R^dwi/ds, R^dwi/da, etc., that are non- 
zero for some observations. Analogous considerations apply to 
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the excess noise terms ef. they too are estimated by means of 
the residuals (Sects. 5.1.2 and 5.2.5 i, and therefore in principle 
introduce additional terms in the partial derivatives. 

We take the somewhat pragmatic approach of neglecting the 
terms depending on the partial derivatives of w/ and e; with re- 
spect to the unknowns when seeking the solution to the global 
minimization problem. The consequences of this approxima- 
tion can be appreciated by observing that the down-weighting 
(wi < 1) only kicks in when the absolute value of the residual 
exceeds a few times the total standard uncertainty, or some 0.5- 
1 mas for typical observations of bright sources. Similarly the es- 
timated 6/ are only sensitive to changes of the residuals of a simi- 
lar size. Experience with AGIS runs on simulated data show that 
the typical changes of the residuals fall below this level already 
in the first few iterations. Thereafter the weight factors and the 
estimated excess noises do not change significantly. Neglecting 
the derivatives of w/ and e/ in the global minimization problem is 
therefore equivalent to solving the weighted least-squares prob- 
lem with wi and e/ fixed at whatever values they settle to after the 
initial iterations. This is a reasonable assumption given that the 
statistical weight of any observation, and the size of the mod- 
elling errors, are not a priori expected to depend on the actual 
values of the parameters. The precise solution does of course de- 
pend on how wi and e/ are estimated, but that is an unavoidable 
consequence of any practical data analysis approach. 

4.3. Normal equations 



The minimization of Q in Eq. ( 27 1 is thus solved by the weighted 
least-squares method, assuming fixed weights Wi that are how- 
ever determined as part of the solution. The normal equations 
for the sources are given by ^dQ/ds = 0, and similarly for 
the other unknowns. Linearising around any reference point 
(s"'^, a"^', c"^', g'^'^^) within the linear regime of parameter space, 
i.e., setting s = s"^* + Xs, and similarly for the other un- 
knowns, and expanding to first order in the differential vector 
X - (x',,x'g,x'^,x'y, we find the normal equations as 



dRi dRi 
dx dx' 



Wi 



dx 



'R,(s"'\a'''\c"'\g''^)W,. (29) 



This can be written in matrix form as 

Nx^b, 



(30) 



where is a symmetric matrix. We now proceed to analyse the 
structure of this linear system of equations in terms of the previ- 
ously mentioned block updates S, A, C, G. 

The matrix and the vectors (column matrices) x, b can be 
partitioned into sub-matrices and sub-vectors corresponding to 
the different parameter vectors s, a, c, and g: 



(31) 



where Nas - A^jo' ^tc. Of importance here is that Nss and Naa 
have a particularly simple structure. Since s is sub-divided into 
vectors s,- of length 5 for the individual primary sources (/), it 
is natural to sub-divide Nss into blocks of 5 x 5 elements. From 





Nsa 


Nsc 










\bs] 


N,s 


^ aa 


N,c 


Na, 








ba 


N,s 


Nca 


Ncc 


Nc, 








be 


[Nss 




Nsc 


Nss\ 








ibsl 



Eq. ( 29 1 it follows that the (/, j)th such block is given by 



ds: ds' 



ydR.8R, 
dsi ds' 





if / 



if i^j. 



(32) 



where I e / signifies that the sum is taken over the observations 
of source /. The result for / j follows because no observa- 
tion I is associated with more than one primary source. A^jj is 
consequently block-diagonal, and N^jbs can trivially be com- 
puted for arbitrary vector bs by looping through the sources and 
solving the corresponding 5x5 system for each source 0This IS 
exactly what is done in the source update block (S). 

The vector of attitude unknowns is naturally sub-divided into 
vectors a„ of length 4, containing the elements of the quater- 
nions a„ that serve as coefficients in the B-spline representation, 
Eq. ( 10 1. If Naa is correspondingly sub-divided into blocks of 



4 X 4 e: 



ements, it follows from Eq. ( 29 1 that the (n, OT)th such 



block is given by 



INaaU^Y,^^^'- 



dRi dRi 
dan da' 



(33) 



With E denoting the left index of f; (Sect. 3.3 1, we have 
dRi/da„ = whenever n<i-M+loTn>{, where M is the 
order of the spline. It follows that [Naa]mn = if \n -m\ > M - 1 
(cf. Appendix |B.l| l. The non-zero blocks in Naa are therefore 
confined to the diagonal and M - 1 blocks above and below the 
diagonal (Fig.jsjl. Thus, N^^ba can be efficiently computed for 
arbitrary ba since the Cholesky decomposition of the matrix does 
not produce any additional fill-in (Appendix [C|. This system is 
solved in the attitude update block (A). 
In 



the geometric instrument model (Sect. 3.4 1, each 
CCD/gate and time interval combination (index ngk for example 
in Eq. [TSj l has its own set of unknowns. Moreover, a given ob- 
servation / can only refer to one CCD/gate combination. By the 
same reasoning as above, the sub-matrix Ncc is therefore block- 
diagonal, and A^^^' be can be computed for arbitrary be by looping 
over the CCD/gates combinations. This is done in the calibration 
update block (C). Although the number of calibration parame- 
ters per CCD/gate combination can be fairly large (~ 10"*), the 
resulting systems are well within the bounds that can readily be 
handled by direct matrix methods, even without taking into ac- 
count their sparseness. 

By definition all the global parameters affect each observa- 
tion, and the sub-matrix Ngg is therefore full. However, since the 
number of global parameters is never large, N^^bg can easily be 
computed, which is done in the global update block (G). 

In the description above we have implicitly assumed that 
each of the diagonal blocks Nss, Naa, Ncc, and A^g^ is non- 
singular, and even well-conditioned in order to avoid numeri- 
cal instability. This is equivalent to the statement that each of 
the blocks S, A, C and G is a well-posed problem: for exam- 
ple, that the determination of the source parameters is 'easy' if 
we assume that we know the attitude, calibration and global pa- 
rameters. This will in practice be guaranteed by the choice of 
primary sources (which will make [A^jj],, well-conditioned for 
every /) for the S block, and by the adopted attitude, calibration 
and global parametrizations, including the constraints necessary 
to render the updates unique - in particular the quaternion length 
normalization in Eq. ( 10 1 for the attitude model, and Eq. (22 1 for 
the calibration model. 

Turning now to the off-diagonal sub-matrices of A^, it is nat- 
ural to sub-divide for example the sub-matrix A^„j into blocks of 
4x5 elements corresponding to the 4 components of the quater- 



* Here, and elsewhere in this paper, an expression like A 'b is short- 
hand notation for solving the system Ay = b. The inverse matrix A"' is 
(usually) not computed, but only the solution vector j = A^'fr itself. 



L. Lindegren et al.: The astrometric core solution for the Gala mission 



13 



nion and the 5 astrometric parameters. The (n, 0th block, 

dR, dRi , 
da„ ds' 



(34) 



is non-zero only if source ; was observed in the time interval 
[t„,t„+m], which is the support of the B-spline B„{t). Nas is 
therefore very sparse, but it also has no simple structure because 
the distribution of the non-zero blocks is linked to the scanning 
law. Fortunately, as will be explained in Sect. 4.5 there is no 
need to explicitly compute, much less store, this sub-matrix, nor 
any of the other off-diagonal sub-matrices in Eq. ( 3 1 1. 



4.4. Rank of the normal equations 

From the nature of the astrometric observations, which are in 
effect differential within the (combined) field of view, and the 
modelling of the primary sources, which does not assume that 
any of their positions or proper motions are known a priori, it is 
clear that there is no unique astrometric solution to the problem 
as outlined above. The fundamental reason for this is that any 
(small) change in the orientation of the celestial reference sys- 
tem, as well as the introduction of a (small) inertial spin of the 
system, would leave all observations invariant. In principle the 
non-uniqueness of the solution is not a problem as such, since the 
resulting system of positions and proper motions are afterwards 
aligned with the ICRS by a special process (Sect. [6T| . However, 
it does imply that the normal matrix is in principle singularFl 
which may have consequences for the numerical solution of the 
normal equations. We say singular 'in principle' because arith- 
metic rounding errors will in practice prevent it from becoming 
truly singular, although it remains extremely ill-conditioned. 

More precisely, we expect to have rank n-6, if n - dim(A^) 
is the total number of unknowns. The null space of the matrix. 



N(N) = {v e R" I A^v = 0) 



(35) 



therefore has rank six. Indeed, it is easy to find six linearly in- 
dependent vectors vq, . . . , V5 that span the null space: the first 
three are found by introducing small changes in the orientation 
of the celestial reference system around each of its principal 
axes, and deriving the corresponding changes in s and a (c and 
g being independent of the reference system); the last three are 
correspondingly found by introducing a small inertial spin of the 
reference system around each axis (see Sect. 6.1.5 for details). 
Introducing the n x 6 matrix V 



(vo, 

A^y = 0. 



vg) we have 



(36) 



The singularity can in principle be removed by adding the six 
constraints V'x = 0, but in practice that is not necessary. It suf- 
fices to derive one particular solution x to the normal equations, 
then the whole solution space can be written x + Vz for arbitrary 

Z G 

tor (Sect 



^. The vector z is effectively determined by the frame rota- 
yielding the unique solution that best agrees with 



6.1 



the adopted definition of the ICRS. 

Quite apart from numerical rounding errors, it is not com- 
pletely true that A^ is mathematically singular. Stellar aberration 
and parallax introduce some absolute knowledge about the ref- 
erence system via the barycentric ephemeris of Gaia, which is 
expressed in ICRS and is not part of the adjustment process. 
However, since stellar aberration is at most 10 rad, the ori- 
entation error would have to be of the order of 1 mas for the 



' More precisely, is properly semidefinite, so that x'Nx > for all 
X i= 0, with equahty for some x 0. 



aberration effect to change by 0.1 yuas (say). So, although absent 
in principle, the indeterminacy of the reference frame orienta- 
tion and spin exists in practice. Since the orientation can be de- 
termined to much higher accuracy than 1 mas (by means of the 
optical counterparts of radio sources), the contribution of stellar 
aberration to the absolute frame knowledge can be neglected in 
practice. The same holds, a fortiori, for the much smaller paral- 
lax effect. 



4.5. The simple iteration step 
Consider the system 



K 





Nas Naa 

N N N 

cs ca cc 









N N N N 

gs ga gc g 



(37) 



where b is the same right-hand side as in Eq. (30i, and each 
stands for a zero-filled sub-matrix of the appropriate dimen- 
sions. Although this system is of the same size as Eq. ( [30| , is it 
fundamentally different in that it can be directly solved through 
a sequence of smaller systems. 



(38) 



where each sub-system is of the form S, A, C, G described 
above, allowing it to be solved with relative ease. (Naturally, the 
resulting solution d is also quite different from the x in Eq.[30|) 
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By means of Eqs. ( 29 1 and ( 34 1 the right-hand side in the second 
sub-system becomes 



ba N s 



/ 



dRi 
da 

dR, 
da 



7?K*"'',a'-^*',c'-^f,/'=') + ^rf, 

OS' 



Wi 



Riis''^ + d„ a''\ c''\ g"'')Wi , (39) 



ref ^ref „ref\ 



where the linearity of Ri(s, a, c, g) has been used in a Taylor ex- 
pansion around the reference values. This shows that the off- 
diagonal matrix Nas is not needed in order to compute the right- 
hand side of the second sub-system in Eq. ( 38 1, if only the resid- 
uals are computed after the source parameters have been updated 
by the solution of the first sub-system. Similarly, we find that the 
right-hand side in the third sub-system can be obtained from the 
residuals after updating both the source and attitude parameters, 
and so on. The off-diagonal sub-matrices in K are therefore not 
needed, provided that the sub-vectors of unknowns are succes- 
sively updated before the new right-hand sides are computed. 

From the above it is clear that a single AGIS iteration, con- 
sisting of the successive application of the four update blocks S, 
A, C and G, is mathematically equivalent to an updating of the 
unknowns by = K^b. In the context of iterative solution al- 
gorithms, the matrix K is referred to as the preconditioner of the 
normal equations system ( Axelsson|1996[ ). 

As previously noted, we assume that the block-diagonal ma- 
trices A^jj, Naa, Ncc, Ngg are all non-singular, and in fact pos- 
itive definite, for a proper formulation of the S, A, C, and G 
blocks. This ensures that the preconditioner K is non-singular, 
even though A^ is not. 

In the course of the iterations, new right-hand sides of the 
normal equations will be computed, while the matrix remains 
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essentially unchanged. From now on, let us express the residuals 
Ri not as functions of the parameter values s, a, c, g but in terms 
of the differential parameter vector x relative to the reference 
values. The original right-hand side, denoted ft*"', corresponds 
to the initial differential parameter vector a:**'' = 0. If (without 



a superscript) denotes the exact solution of Eq. ( 30 1, the initial 



error vector is e^^^ - jc*"' - x - -x. Although this vector is of 
course not known, we do know -A^c'"* - Nx - ft*"'. Solving 
the preconditioner system (37 1 gives rf*"' = AT"' ft*"' and the up- 



r(0) 



dated parameter vector jc*'' = jc*"' + rf*"'. The new error vector 
^(1) = jc*"-Jt: is again not known, but -A^e*" = ft*°'-A?rf<°' = ft*" 
is obtained by inserting the updated parameters in the right- 
hand side of Eq. (29 1, by the same argument as in Eq. (39i. 
Generalizing, we have the so-called simple iteration scheme 



ftW 



#+1) 



= 0, 1, 



(40) 



For the successive right-hand sides we find by recursion 

ft**+" = ftW - AfrfW = (/ - AfA:-i)ftW = B^+ift*"' , (41) 



where 

For the successive updates and errors we find 



where 



B = I-K-^N 



(42) 



(43) 
(44) 



(45) 



is called the iteration matrix ( Axelss onl|1996| l. Equations (41 
(45 1 are of great theoretical interest, as explained below, al- 
though none of them is used in the actual computations. 

The convergence behaviour of the simple iteration scheme 
can largely be understood by means of these relations and espe- 
cially in terms of the properties of the iteration matrix B gov- 
erning the sequence of updates d and errors e, and the adjunct 
matrix B governing the sequence of right-hand sides. It is well 
known (e.g.. Axel s son| 1996 1 that the simple iteration scheme in 
Eq. (40 1 converges (that is j:**^' — > jc) for arbitrary starting ap- 
proximation if and only if p{B) < 1 . Here, piB) is the spectral 
radius of B, i.e., the largest absolute value of its eigenvalues. 
Under this condition we have rf**' — > and e**' — > for A: — > oo. 
Also the right-hand side ft**' = Kd^''^ — > under the same con- 
dition0 



As discussed in Sect. 4.4 the normal matrix is singular 



and its null space spanned by the n x 6 matrix V. Therefore, 



BV 



V-K ^NV 



(46) 



which shows that B has a six-fold eigenvalue equal to 1, with the 
corresponding eigenvectors spanning NiN). The spectral radius 
of B is therefore not less than 1, and the simple iteration scheme 
does not converge for arbitrary initial errors. 



B and B = KBK ' have the same spectral radius; indeed, their 
eigenvalues are the same, as can be seen from the characteristic polyno- 
mial det(z/ - B) = del(K){zI - B) det(^r-') = det(z/ - KBK^) being 
the same for the two matrices (A. Bombrun, private communication). 



This is not a real problem, for the following reason. First, let 
us note that B is not a full-rank matrix. Indeed, a direct calcu- 



lation of Eq. ( 45 1 using the expression for K in Eq. ( 37 i shows 
that the first columns of B are zero Thus (at least) of its 
eigenvalues are identically zero. A corresponding set of orthog- 
onal unit vectors is given by the columns of the nxris matrix 



Z = 



rows 
n- n, rows 



(47) 



so that BZ - 0. We assume that the remaining n - -6 eigen- 
values satisfy < < 1 . Thus, if A is the diagonal matrix con- 
taining these eigenvalues and U a matrix of size nx(n - «, - 6) 
whose columns are made up of the corresponding eigenvectors, 
we have BU = UA. The columns of Z, U and V together span 
R", and it is therefore possible to decompose the solution vector 
as 



X - Zxs + Uy + Vz 

where Xs e R"' is the 'source' part of jr, j e ! 
Since -e*"' = jc we find 



(48) 



and z e 



Be**" = BZx, + BUy + BVz = UAy + Vz , (49) 



and by recursion 



■ e*^' = UA''y + Vz . 



(50) 



The first term clearly vanishes for ^ — » oo if p(A) < 1, as we 
have assumed. The second term, which is the projection of x on 
the null space of A^, remains unchanged by the iterations. The 
update vector can be written 



rf**' = e 



(i+i) , 



= [/A*(/ - A)y , 



(51) 



which vanishes under the same condition. The same is then true 
for the right-hand side ft**' = /(Trf**'. Effectively, this means 
that we can ignore the singularity of A^ in the simple iteration 
scheme; it will converge to some valid solution of the (singular) 
normal equations, and the convergence process can be monitored 
by means of the vectors rf**' and ft**'. After convergence, the re- 
quired null-space components can be found and added by means 
of the frame rotator 

To summarize, we have shown that the simple iteration 
scheme converges in the desired way, provided that the spectral 
radius of B, not counting the eigenvalues related to the null space 
of A^, is less than 1. It is well known (e.g., Golub & van Loan 
19961 that this condition is satisfied for any symmetric positive 
definite A^, using the Gauss-Seidel preconditioner. However, the 
spectral radius may be very close to 1, implying very slow con- 
vergence. In the case of AGIS the situation is more complex, 
and convergence is in practice demonstrated through simulations 
(Sect. |7]), but the theoretical background outlined above is of 
great help when interpreting the results. 



' That the first n , columns in the iteration matrix are zero means that 
the results of the next iteration are independent of the current source 
parameters xf \ This may seem surprising at first, but is a simple conse- 
quence of the fact that each iteration starts with the source update block 
(S). In this block, the updated source parameters depend on the previ- 
ous attitude, calibration and global parameters, but not on the previous 
source parameters. 
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4.6. Order of the block processes 

In the preceding sections we have assumed that the four blocks 
are always executed in the sequence SACG, and that the updated 
parameters resulting from each block is used in the subsequent 
blocks. The particular preconditioner K in Eq. ( [37] i incorporates 
these assumptions. We will now discuss variants of this scheme 
and show that they can be mathematically represented by differ- 
ent preconditioners. 

Each AGIS iteration always starts with the source update 
block (S). The main reason for starting with S rather than A, 
for example, is that the observations can be inspected and anal- 
ysed one source at a time in order to estimate the quality of the 
data, identify possible outliers, and check whether the source is 
suitable as a primary source. The identification of outliers, in 
particular, requires several 'inner' iterations of the source obser- 



Estimate using 



vation equations (Sect. 5.1.2 1, which can be done with relative 
ease because of the small number of data points involved. 

The order of the remaining blocks ACG adopted in the pre- 
ceding sections is more or less arbitrary. It is easy to see through 



an analogy with Eqs. ( 37 i-( 38 i that the sequence SCAG (for ex- 



ample) is mathematically represented by the alternative precon- 
ditioner 



K 


















Nac 










Ncr 














(52) 



and that other permutations of the sequence are similarly ob- 
tained by transposing the corresponding off-diagonal blocks. 
Changing the preconditioner means changing the iteration ma- 



trix (45 I and therefore possibly also its eigenvalues, which in 
turn govern the rate of convergence. 

From a mathematical viewpoint, the choice of the start- 
ing block (S in our case) determines the initial update rf^*'* 
(cf. footnote |9]l and therefore influences all subsequent updates. 
However, this particular choice is not expected to have much 
influence on the convergence rate after a number of iterations, 
since the S block has no special status in the periodically re- 
peated sequence . . . SACGSACGSACGSA. . . . Thus we may 
surmise that the different iteration matrices for the cyclically 
permuted sequences SACG, ACGS, CGSA and GSAC do in 
fact have the same set of eigenvalues. For symmetry reasons it 
can also be surmised that the reversed sequences GCAS, SGCA, 
ASGC and CASG have the same eigenvalues as the original se- 
quences. Thus, there are in fact only three sets of sequences with 
possibly distinct convergence behaviour, represented by SACG 
(or SGCA), SAGC (or SCGA), and SCAG (or SGAC) if we take 
S to be the starting block. There is no obvious way of knowing 
a priori which of the three possibilities is to be preferred, or even 
if they are significantly different. 

Apart from the various permutations discussed above, there 
is another way to modify the AGIS iteration scheme, which can 
also be described in terms of the preconditioner This modifica- 
tion is related to the practical organization of the flow of data to 
the different block processes. The current implementation of the 
simple iteration scheme differs somewhat from the description in 
Sect. 4.5 and the block updates are in fact practically organized 



as follows: 



1 . Initialize the iteration counter, k - 



2. Choose starting values for all unknowns: s^^\ a*"', c^'^\ 



y(0) 



3. Estimate using a*"', c^"', g" 



jik)^ f.{k)^ g(k)^ 

4. Estimate using c*, g*. 

5. Estimate c'*+'' using a®, g*. 



7. Increment k and go to Step 3. 



The crucial difference compared with Eq. ( 38 1 is that the C block 
in Step 5 does not use the updated attitude but the old one, and 
that the G block in Step 6 similarly uses the 'old' attitude and 
calibration parameters. This has the practical advantage that the 
A, C and G blocks can be carried out in parallel, with big sav- 
ings in terms of the amounts of data that have to be exchanged 
between the processes (see Sect. |7]i. Schematically, this can be 
represented as S[ACG], where the blocks in brackets are (or can 
be) executed in parallel (and so the order of the bracketed blocks 
does not matter). The corresponding preconditioner is 
















Naa 





























(53) 



Yet other variants may be considered, for example S[AC]G, 
where Step 6 uses the updated attitude and calibration param- 
eters, with preconditioner 



and [SACG], for which 



K 














N„s 


Naa 













Ncc 









Ngc 


















Naa 














Ncc 

















(54) 



(55) 



This last case is known as the (block) Jacobi method, while the 
use of a full triangular preconditio ner as in Eq. ([37] l is known as 
the (block) Gauss-Seidel method ( |Axelsson|1996| l. As we have 
seen, the currently implemented simple iteration scheme is in- 
termediate between the Jacobi and Gauss-Seidel methods. 

Intuitively, we expect the Gauss-Seidel method to converge 
more quickly than the Jacobi or any intermediate method, simply 
because each block then uses the most recent (and, presumably, 
best) estimates of the parameters. However, our practical expe- 
rience with AGIS shows that the 'difficult' part of the problem 
is to disentangle the source and attitude parameters. For exam- 
ple, the calibration parameters are generally found to converge 
much faster than the source and attitude parameters. Thus it does 
not seem to matter much if Step 5 above uses or a**', i.e., 
whether the sub-matrix Nca is included or not in the precondi- 
tioner A similar argument can be made concerning the G block, 
provided some measures are taken to decorrelate the global pa- 



rameters from the source parameters (Sect. 5.4 1. Thus, the var- 
ious intermediate methods are probably nearly as good as the 
Gauss-Seidel method, in terms of the convergence rate, and the 
precise scheme may then rather be determined by practical con- 
siderations. With the present data processing architecture, the 
favoured scheme is S[ACG] as described above. 



4.7. Accelerated iteration, conjugate gradients and the hybrid 
iteration scheme 

The 'simple iteration' (SI) scheme described above was the start- 
ing point for a long development towards a fully functional 
scheme with much improved convergence properties. The main 
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Stages in this development were the 'accelerated simple itera- 
tion' (ASI), the conjugate gradients (CG), and finally the fully 
flexible 'hybrid scheme' (A/Sl-CG) to be used in the final im- 
plementation of AGIS. As much of this development has at most 
historical interest, only a brief outline is given here. 

Already in the very early implementation of the simple iter- 
ation scheme it was observed that convergence was slower than 
(naively) expected, and that after some iterations, the updates 
always seemed to go in the same direction, forming a geometri- 
cally (exponentially) decreasing series. With the hindsight of the 
analysis in Sect. 4.5 this behaviour is very easily understood: the 
persistent pattern of updates is roughly proportional to the eigen- 
vector of the largest eigenvalue of the iteration matrix, and the 
(nearly constant) ratio of the sizes of successive updates is the 
corresponding eigenvalue. From this realization it was natural to 
test an acceleration method based on a Richardson-type extrap- 
olation of the updates. The idea is simply that if the updates in 
two successive iterations are roughly proportional to each other, 
dik-i-i) ^ ^dW^ with \A\ < 1, then we can infer that the next up- 
date is again a factor A smaller than rf**^'', and so on. The sum 
of all the updates after iteration k can therefore be estimated as 
rf(*^-*-i> + + /l^rf^'^+i) + ... = (!_ ^)-irf(^+i). Thus, in iter- 

ation A- -H 1 we apply an acceleration factor 1/(1 - A) based on 
the current estimate of the ratio A. This accelerated simple it- 
eration (ASI) scheme is seen to be a variant of the well-known 
successive overrelaxation method ( |Axelsson|1996| ). The factor A 
is estimated by statistical analysis of the parallax updates for a 
small fraction of the sources; the parallax updates are used for 
this analysis, since they are unaffected by a possible change in 
the frame orientation between successive iterations. With this 
simple device, the number of iterations for full convergence was 
reduced roughly by a factor 2. 

Both the simple iteration and the accelerated simple itera- 
tion belongs to a much more general class of solution methods 
known as Krylov subspace approximations. The sequence of up- 
dates d^''\ k - . . . K - 1 generated by the first K simple itera- 
tions constitute the basis for the /T-dimensional subspace of the 
solution space, known as the Krylov subspace for the given ma- 
trix and right-hand side (e.g., |Greenbauin]| 1 997 [ |van der Vorst| 
20031). Krylov methods compute approximations that, in the kth 
iteration, belongs to the A:-dimensional Krylov subspace. But 
whereas the simple and accelerated iteration schemes, in the A:th 
iteration, use updates that are just proportional to the kth basis 
vector, more efficient algorithms generate approximations that 
are (in some sense) optimal linear combinations of all k basis 
vectors. Conjugate gradients (CG) is one of the best-known such 
methods, and possibly the most efficient one for general symmet- 
ric positive-definite matrices, (e.g., | Axelsson| 1 996, ,Bj orck| 1 996[ 
van der Vorst|2003 i. Its implementation within the AGIS frame- 



work is more complicated, but has been considered in detail by 
|Bombrun et al.| ( [201 1} . As it provides significant advantages over 
the SI and ASI schemes in terms of convergence speed, this algo- 
rithm has been chosen as the baseline method for the astrometric 
core solution of Gaia (see below however). From practical expe- 
rience, we have found that CG is roughly a factor 2 faster than 
ASI, or a factor 4 faster than the SI scheme. Like SI, the CG 
algorithm uses a preconditioner and can be formulated in terms 
of the S, A, C and G blocks, so the subsequent description of 



these blocks remains valid. In the terminology of Bombrun et al. 
( 2011| l the process of solving the preconditioner system Kd = b 



is the kernel operation common to all these solution methods, 
which only differ in how the updates are applied according to 
the various iteration schemes. 



The CG algorithm assumes that the normal matrix is con- 
stant in the course of the iterations. This is not strictly true if 
the observation weights are allowed to change as functions of 
the residuals, as will be required for efficient outlier elimina- 
tion (Sect. 5. 1.2 1. Using the CG algorithm together with the 



weight-adjustment scheme described below could therefore lead 
to instabilities, i.e., a reduced convergence rate or even non- 
convergence. On the other hand, the SI scheme is extremely sta- 
ble with respect to all such modifications in the course of the 
iterations, as can be expected from the interpretation of the SI 
scheme as the successive and independent application of the dif- 
ferent solution blocks. The finally adopted algorithm is there- 
fore a hybrid scheme combining SI (or ASI) and CG, where SI 
is used initially, until the weights have settled, after which CG 
is turned on. A temporary switch back to SI, with an optional 
re-adjustment of the weights, may be employed after a certain 
number of CG iterations; this could avoid some problems due to 
the accumulation of numerical rounding errors in CG. 



5. Updating processes 

In this section we describe in some detail each of the updating 
blocks S, A, C and G that form the basis (or kernel process) for 
the AGIS iteration loop. 



5.1. Source updating (S) 
5.1.1. The normal equations 



The astrometric model for the sources is given in Sect. 3.2 In the 
source update block (S) the source parameters s are improved 
by solving the first line in Eq. (38 i. According to Eqs. (|29]) and 



( 32 1 this can be done for one source (/) at a time by solving 



the following system of equations for the update rf, of the five 
astrometric parameters in s,-: 



2 

lei 



dRidR,^^ 
osi as. 



lei 



dsi' 



(56) 



Here Wi - wi/{aj + ef), with cr; denoting the given formal stan- 
dard uncertainty of observation I, expressed as an angle, w/ and 
6/ are, respectively, the downweighting factor and excess noise 
introduced in Sect. 3.6 In Eq. (56 1 the dependence of Ri on a, c 
and g has been suppressed, since the system is solved with these 
parameters fixed. 



In matrix notation, the normal equations (56 1 can be written 



A-WiAidi = A'lWihi., 



where 



dRi 



(57) 



(58) 



lei 



is an n,- x 5 matrix with n,- = 1 the number of observations 
of source / (typically n, ~ 800), 



hi = [Ri(si)h 



(59) 



is a column matrix of length n,, and W, is a diagonal matrix con- 
taining the weight factors W/. Although the residuals Ri are in 
principle non-linear functions of s,, this non-linearity can be ne- 
glected if the parameters are close enough to the final solution, 
i.e., if the resulting update di is small enough. The partial deriva- 
tives in A, can then be regarded as fixed throughout the updating 
process, and A, and hj can immediately be computed when en- 
tering the source updating. The weight matrix W,, on the other 
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hand, depends on w; and e/, which are modified as part of the 
process as explained below. 



For given W, the system of equations ( 57 1 is solved using 
the Cholesky algorithm (Appendix [C]!. At the end of the source 
updating, the full (5 x 5) inverse matrix is computed, providing 
a first-order estimate of the covariance of the astrometric param- 
eters in Sj (see Sect. |6.3|l . 

We note that Eq.lSTb corresponds to the overdetermined sys- 
tem of observation equations 



A, di - hi (weight matrix Wi) . 



(60) 



After solution, the updated residuals are contained in the vector 
Ri - hj-Aidj, and the contribution of the source to the objective 
function Q is given by 



Qi = R'iWiRi 



(61) 



5.1.2. The inner iteration: identifying outliers and estimating 
tine excess source noise 

Since the weight matrix W, depends on the downweighting fac- 
tors wi and the excess noises e/, which in turn depend on the 
updated residuals the normal equations (jSTjl are non-linear 
and must be solved by iteration. We refer to this as the inner 
iteration of the source update, to distinguish it from the AGIS 
iteration discussed in Sect. 14.: 



Using Eq. (28 i we can write 



Wl 



o-j + e«(f/) + ef 

ll/2 



(62) 



where cr/ - ^crf + e^(f;)j is the formal standard uncertainty 
of the observation adjusted for the excess attitude noise, which, 
when entering the source update, is assumed to be known from 
a previous attitude update (Sect. 5.2.5 i. In the very first source 



update the excess attitude noise must be set to the mean errors 
of the pre-AGIS attitude parameter estimates which are used for 
starting up the iterations. The downweighting factors depend on 
the normalized residuals, i.e.. 



Wl - w 



Ri 



(63) 



where the function w(z) is such that w(z) = 1 for |z| < 3 and 
gradually decreasing to for larger |z| (see Eq. [66] ). 

The inner iteration actually consists of two nested proce- 
dures: an outer one which determines the downweighting factors 
Wl, and an inner one which determines the excess source noise 
e, for a fixed set of w;. We begin by considering the inner proce- 
dure. 

For fixed downweighting factors, the source update aims to 
minimize Q, in Eq. (61 1 with respect to the unknown d, and e,. 



But it is immediately seen that Qi can be made arbitrarily small 
just by making e, large enough. Consequently, we cannot use 
unconstrained minimization to solve this problem. The neces- 
sary constraint is provided by the condition that Qi, under the 
assumption that the excess noises have been correctly estimated 
and the outliers properly removed, should follow the chi-square 
distribution with v = n,- - nout - 5 degrees of freedom. Here n,- is 
the number of observations of source /, Wout the number of out- 
liers and 5 the number of astrometric parameters estimated. In 
particular, the expected value is E{Qi) - v. The number of out- 
liers «out is estimated by counting the number of observations 



with Wl < 0.2. This limit was empirically found to give a rea- 
sonable estimate of the actual number of outliers in a variety of 
simulated cases. 



For given e, the weight matrix is now known, Eq. ( 57 i can be 
trivially solved and the residuals /?, computed. We thus define 
the function Q{ef) - R'.WiRi. The excess source variance is then 
taken to be 







ife(0)< V, 



solution of Qief) - v otherwise. 



(64) 



In the second case, the non-linear equation Q{y) = v is iteratively 
solved by a series of improvements Ay - (l- Q(y) / v) Q{y) / Q'(y), 
starting from y - 0. Typically, 2-3 iterations are sufficientFj 

This procedure returns a positive e,- as soon as 2(0) > v. If 
the resulting e, is much smaller than the typical cr; of the source, 
it is probably not significant. The significance of the e, can more 
easily be judged from an auxiliary statistic that can be computed 
almost for free. Under the null hypothesis (e, = 0) we know that 
6(0) ~ xl^ so the expected value is v and the variance 2v. Thus 
we may take 

^ ' (65) 



D 



V2^ 



as a measure of the significance of the estimated e,, with D > 2 
indicating a probably significant value. 

Having determined e,, and a corresponding set of residuals 
Ri, the downweighting factors can immediately be computed 
from Eq. ( 63 i. However, having changed the downweighting fac- 
tors (and possibly v) it is now necessary to repeat the estimation 
of e, and d, with the new set w/. Typically, four such iterations 
are found to be sufficient. The only remaining problem is how 
to start the iterations, that is the initial selection of the down- 
weights Wl. It is not possible to start by assuming wi - 1 for 
all the observations, since we must take into account that some 
small fraction of the data could be utterly wrong. Such gross out- 
liers, if not removed already from the start, would severely slow 
down or even prevent the convergence of the inner iterations. 
The adopted solution is to make an extremely robust estimation 
of the standard deviation of the initial residuals (contained in hi), 
from which the initial downweightings are obtained. This robust 
standard deviation is calculated as half the intersextile range 
of the elements in hi, whereupon the initial w/ = w(hi/gi). 

After convergence of the inner iteration, the statistical weight 



of the source Wi is computed according to Eq. ( 1 19 1. This quan- 
tity, together with e, and the magnitude, are the most important 



indicators for the selection of primary sources (Sect. [6.2.2 1 
The weight function w(z) currently used is the following 



w(z) = 



1 

1 - 1.773735f2-H 1.141615f^ 
[exp(-k|/3) 



if kl < 2 

if2<|z|<3 (66) 
otherwise. 



where t - \z\ - 2 and the numerical constants have been chosen 
to make a smooth transition at |z| = 3. The exponential decay for 
|z| > 3 provides a dramatic weight reduction for large residuals; 
e.g., at 10 sigmas we have w(10) - 0.036, while at 100 sigmas 
we have w(lOO) ^ 3 x 10"'^ 



This iteration formula can be derived by matching the rational ap- 
proximation Q(y) ^ a/(b + y) to the value and derivative of Q{y) at the 
current point y. Compared with the standard Newton-Raphson method, 
which uses a hnear approximation around the current y, the present for- 
mula converges much quicker due to the more reasonable behaviour of 
the rational approximation especially for large y. 
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Among the many different weight functions proposed in the 
literature, the so-called Huber estimator ( Huber|198T| l using 



wh(z) = 



1 if kl < c 

c(2\z\ - c)z^^ otherwise 



(67) 



has been considered as an alternative to Eq. (66 1, e.g., with c - 2. 
This gives a much slower weight decay for large residuals, e.g., 
wh(10) = 0.36 and wh(IOO) - 0.0396. Future experiments may 
decide which weight function will finally be used for AGIS. 

5.1.3. Calculation of partial derivatives 



The calculation of the partial derivatives in Eq. (58 1 is done as 
follows. From Eqs. ( [25] l and (|26]) we have, by means of ( [T3] l, 



ds! 



ds! 



' ds' 



do 
ds! 



(68) 



In analogy with Eq. (jSj) we introduce the auxiliary vectors mi - 
(z X M;) and n/ - U/X mi, which together with M/ form the normal 
triad {mi iii m/] with respect to the SRS; its components in the 
SRS are given by the columns of the matrix 



S'[m, «, «,] = 



-simpi -sin^/cos^/ cos ^; cos ^/ 
cos (fi - sin sin tfi cos sin (fi 
cos sin 



By differentiation of the last column we obtain 

dui 
d^ 



dfi „ , d^ 
mi — cos + ni — 
ds ds, 



and thus 



dRf^ 
~d7' 



, dui 



dR^"" 

ds! 



dui 

d^ 



(69) 



(70) 



(71) 



These expressions can be evaluated in any coordinate system, 
but perhaps most conveniently in the SRS using S'm; and S'«/ 
from Eq. ( 69 1, and 



d{S'ui) 
ds' ' 







d(C'ui) 
ds! ' 



q/ 



(72) 



from Eq. ( 1 1 



obtain from 
as: 



. To compute the partial derivatives of Cm,, we first 
!q. ^ the derivative^ of the coordinate direction 



dui 

da^i 



dui 



dSi 
dui 



dui 

dmi 



^-(I-rir',)bG(ti)/A,, 



dHa*i 



■ PiTl , 



diii 

dusi 



du 



- — - (/ - rir'i)bGiti)miTi/Au - {piHa*i + qmdr] ■. 



(73) 



where t/ - fB/-fep for brevity, and we have used the normal triad 
in the celestial reference system, Eq. (j5]). Although usually only 
the first five derivatives are needed (see however Sect. |6.3| l, the 
last equation gives, for completeness, the derivative with respect 
to the sixth astrometric parameter the first term corresponds 



" As indicated by the asterisk in the first derivative in Eq. ([73|, the 
differential in right ascension is a true arc, thus: da,i = (Sa,) cos <5, . The 
corresponding update in right ascension, i.e., the first element of the rf, 
obtained by solving iSll, is therefore Aoj cos 5,. 



to the secular change in parallax and the second to the perspec- 
tive acceleration. 

The rigorous transformation from m/ to m/ is quite complex, 
but by far the largest difference (~ 10"^ rad) is due to stellar aber- 
ration. By comparison, gravitational light deflection by the Sun 
is ~2 X 10"^ rad. While the rigorous transformation is required 
to compute the vector M/ itself, some simplifications can be ac- 
cepted when computing the partial derivatives. Indeed, for this 
purpose it is sufficient to consider the classical stellar abeiTation 
formula. 



ut ^ {ui + vc(ti)c > , 



(74) 



accurate to first order in vg/c, where vg - dbc/dt is the barycen- 
tric coordinate velocity of Gaia and c the speed of light. To a 
relative precision better than 10"^ we then have 



dui 

d^ 



« Vg 



c 



dui 

d^ 



(75) 



where / is the 3x3 identity matrix. 

5.2. Attitude updating (A) 

5.2.1. The normal equations 

The attitude model using B-splines to represent the components 
of the attitude quaternion as functions of time is described in 
Sect. 



3.3 



(For reference purposes, conventions for notation and 
some important properties of splines and B-splines are explained 
in Appendix B.l ) In the attitude update process (A) the attitude 



parameters a are improved by solving the second sub-system in 
Eq. (38 I. Recalling that a is divided into sub-vectors of length 



4, representing the quaternions a„ in Eq. ( 10 1, the nth set of four 
equations can be written, using Eqs. ( 29 1 and ( 33 1, 



Z Z 



leL.nL,, 



dRi dRi 
da„ da;„ 



-Wi 



d,„ = - ^ -^Ri(s,a,c,g)Wi 



da„ 



(76) 

Here L„ stands for the set of observations occurring within the 
support of B„{t), i.e., L„ - {1\t„ < ti < t„+m), where M is 
the order of the spline. On the right-hand side, it is understood 
that the residuals Ri are calculated for the most recent source 
parameters s (i.e., from the preceding source updating), while 
the attitude, calibration and global parameters are the not-yet- 



updated ones, as explained in Sect. 4.6 The weights Wi are the 
ones calculated in the source updating. 

The structure of the symmetric matrix Naa is shown in Fig.|5] 
If each quaternion component is represented by a spline of order 
M with degrees of freedom (so« = O...A^-l), the total num- 
ber of unknowns is AN and the average bandwidth of Naa (count- 
ing non-zero elements from the diagonal up) is 4(M - 1) -H 2.5. 
Including the light-hand side, the total number of reals that need 
to be stored for the normal equations is therefore ^ (16M - 2)N 
or 62A^ for cubic splines. With a knot interval of about 15 s, 
about 3 MB is required to store the attitude normal equations 
for one day of observations. Thus it is completely realistic to 
store the attitude normal equations for the entire mission in pri- 
mary memory. Cholesky factorization of the normal equations 
does not produce any more non-zero elements in the matrix; the 
factorization and solution can therefore use the same storage as 
the normal equations. Moreover, since the number of arithmetic 
operations grows only linearly with A^, it is computationally fea- 
sible to solve the normal equations for any stretch of data. 
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Fig. 5. Structure of the attitude normal equations matrix Naa for 
a cubic spline (order M - A). The blocks of size 4x4 are in- 



dexed by (n, m) as in Eq. (33 i. The grey cells represent non-zero 
elements. Since the matrix is symmetric, the elements below the 
diagonal (in lighter grey) need not be stored. 



last B-spline to which 
observation at t^^^ 
contributes: BJ^t) 



first B-spline to which 

observation at 
contributes: S^_^,{f) 




Fig. 6. A natural break in the definition of the attitude spline oc- 
curs if there is a gap in the observations containing at least M 
knots, where M is the order of the spline, fi^st is the time of the 
last observation before the gap, ffjrst the time of the first observa- 
tion after the gap. 



5.2.2. Segmentation of the data 

Even though it is feasible to treat the complete set of normal 
equations for the attitude updating as a single system, it is desir- 
able for several reasons to divide up the data temporally. This 
allows one to set up a very straightforward and efficient dis- 
tributed attitude updating, simply by handing out the processing 
of different time segments to different computing nodes. Also 
the inspection of residuals in order to detect stretches of bad fit 
(caused, for example, by micrometeoroid impacts), and the sub- 
sequent reprocessing of these stretches, is greatly facilitated if it 
can be done on shorter data segments. 

The spline model is capable of interpolating sensibly (if not 
accurately) over short data gaps. However, if the data gap con- 
tains at least M knots (with M - A for cubic splines), the two 
splines on each side of the gap become completely disconnected. 
This is illustrated in Fig. [6] where n and m are the left indices of, 
respectively, the last observation before the gap (fiast) and the 
first observation after the gap (ffii-st). B„{t) and B„i-M+\{t) are the 
last and first B-splines whose coefficients depend on the obser- 
vations before and after the gap. Clearly the two segments of the 
attitude spline are disconnected \fn<m-M+\ or m - n> M. 
We call this a natural attitude break. 

In the absence of natural breaks, artificial ones can be in- 
troduced at suitable intervals by a simple method and without 





knots for 
segment 
K 


Segment K 
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anterior 
knots for 
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Fig. 7. Illustrating the assignment of knots for the attitude update 
solutions in two consecutive segments K and K+\, with a break- 
point at knot index x and using 2L overlapping knot intervals. 



any significant loss of accuracy. The idea is to make separate so- 
lutions for overlapping segments, as illustrated in Fig. |7] The 
segments use a common knot sequence [Tk] that may extend 
over the whole length of the mission. Each segment K defines 
an attitude spline in the interval [fbeg(^), fend(^)], based on data 
with observation times in that same interval. The endpoints co- 
incide with certain knots in such a way that fend(^) - t.i+l and 
t\isg{K + 1) = Ti-L, where Tx is the cross-over knot between seg- 
ments K and K+ \ and 2L the number of overlapping knot inter- 
vals. The anterior and posterior knots for each segment are also 
taken from the common knot sequence. The local character of 
the splines means that the resulting fit around is practically 
the same for the two segments, provided L is large enough. For 
cubic splines (M = 4) it is found that L - 12 is sufficient. 



Each segment gives a system of normal equations ( 76 1 for 
the updates d„ to the attitude parameters a„ for a certain range 
of index n. For example, with reference to Fig. |7] in segment K 
updates are computed up to and including n - x + L-\, while in 
segment K+ 1 updates are computed starting with n - X-L-M+ 
1 . At least in the middle part of the overlap region, the updates 
for a given n should be essentially the same in the two segments. 
It therefore does not matter much which of the results is chosen. 
The mid-point is at index n = x - M/2 if M is even, or half-way 
between x-{M + l)/2 and x - (M - l)/2 if M is odd. We may 
therefore agree to use the solution from segment K to update a„ 
up to and including index n = x - [M/l], while the solution 
from segment K + I is used starting with n - x - [M/l] + 1. 
The important thing is that no n is missed by the updating, nor 
updated twice. 

Once all the coefficients a„ have been updated from the dif- 
ferent segmented solutions, the segmentation loses its meaning 
and can in principle be forgotten. For example, when evaluating 
the attitude at a specific time f, it does not matter to which seg- 
ment that instant belonged. In the next iteration of the attitude 
update, a different segmentation can in principle be used. 

The overlapping segments mean that a fraction of the obser- 
vations need to be processed twice in the attitude updating. The 
fraction equals the ratio of the overlap length to the mean length 
of the segment, and increases the shorter the segments are made. 
For example, with a segment length of one day (it would not 
seem reasonable to have shorter segments) and a mean knot in- 
terval of 15 s, the fractional overlap for L = 12 is only 0.4%. 

5.2.3. Calculation of partial derivatives 



For the partial derivatives we obtain in analogy with Eq. (71 
dRf^ 



dqi 



, dui 
m, — sec /■/ , 
' dq, ^' 



dR 



■AC 



, dui 



dqi ' dqi 
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where the derivatives with respect to the attitude quaternion 
should be interpreted componentwise. Differentiating Eq. ( [TT] i 
and using that d(C'H/) = 0, we have 



{d(S'«,), 0} = 2{S'h,, Ojq^'dq,, 
which after some manipulation gives 



(78) 



dRf^ 
dqi 



dR 



AC 



= -2sec^/q/{S'«/,0}, - = 2q/{S'm/,o}. 



(79) 



The derivatives with respect to the spline coefficients a„ are ob- 
tained after multiplying the above expressions by B„(f;), assum- 
ing that the normalization factor in Eq. (10 1 is close to unity 
(Sect.|5;24|. 

5.2.4. Constraining tlie attitude updating 

Since the attitude is represented by a unit quaternion, its compo- 
nents should at all times satisfy q^^ + + q\ + - 1 . All four 
components are nevertheless needed, for while the magnitude of 
any of them can be inferred from the other three components, 
its sign cannot. The redundancy of the representation manifests 
itself in that the length of the quaternion cannot be determined 
from the observations. Indeed, as can be seen from Eqs. 



11 



( 12i, applying an arbitrary non-zero scale factor to the attitude 
quaternion q has no effect on the computed instrument angles, 
and is therefore unobservable. The attitude parameters a„ are 
therefore also undefined with respect to a certain scale value. As 
a consequence, the normal matrix Naa computed from Eq. (76 1 
is singular, and constraints are needed for computing a unique 
solution. 

was introduced to guarantee 



The normalization in Eq. (10 



that the calculated quaternion is a ways of unit length, although, 
as we have seen, this is not strictly necessary for some of the 
subsequent calculationsf*^ 

Naively, one might expect that the coefficients a„ could be 
scaled independently for each B-spline, i.e., that different scale 
factors could apply to each index n. This is not the case, how- 
ever At any time, the (non-normalized) attitude quaternion is a 
linear combination of M adjacent coefficients a„, and unless all 
four coefficients are scaled by exactly the same factor, the result 
will not be a simple scaling of the quaternion}^ Applying this 
argument to every observation time f/, it is readily seen that the 
same scaling factor must be used for all the coefficients in any 
attitude segment without natural or artificial breaks. In principle, 
therefore, the attitude normal matrix for such a segment has a 
rank defect of 1 (that is, the rank is one less than the number 
of attitude parameters), and would only need a single constraint 
equation to become non-singular. 

Numerical experiments, using Singular Value Decomposi- 
tion (SVD; see, e.g., Golub & van Loan|1996| l of the matrix Naa 
computed from simulated observations over successively longer 
time intervals, indeed show the expected rank defect of 1 for 
intervals up to several hundred knots. That is, there is a clear 
gap (of several orders of magnitude) between the smallest singu- 
lar value and the second smallest one. For longer time intervals 



On the other hand we have implicitly assumed ||q|| 
in deriving Eq. 1 79 1. 



1, for example 



10 1 is effected by applying a nor- 



Note that the normalization in Eq. 
malization factor that is a continuous function of time. Since any given 
spline coefficient a„ is used in a finite time interval (namely, the support 
of the corresponding B-spline), one cannot obtain a correct normaliza- 
tion for all t by scaling the spline coefficients by some value depending 
on n. 



the gap gradually closes and the problem thus becomes ill-posed 
( Hansen|1998 1. Thus, any reasonably long time interval will in 
practice require some form of regularization rather than the ap- 
plication of just a single constraint. 

The adopted solution method is a variant of the well known 
Tikhonov regularization ( |Hansen|1998| l. The objective function 
in Eq. ( [27| is modified to include a term depending on the devi- 
ation of the normalization factor in Eq. ( 10 1 from unity for each 
observation. We write the deviation as 



D,= \ 



^ a„B„(f/) 



n=C-M+\ 

and the modified objective function as 

e(s,a,c,g) = ^(7;2 + i2^2)lV,, 



(80) 



(81) 



where /I is a small but non-zero regularization parameter. We 
have found that A - to 10"^ gives a solution that is always 
numerically stable, and quite insensitive to the precise value of 
A. As a result, the normal equations ( 76 1 become 



n+M-l 



leL,,nL„. 



^ Ua„5^^ 5a„5^;l ' 



^IdRi .dDi \ 



The required partial derivatives, obtained from Eq. ( 80 1, are 

- ^ = 2 J] aMt,)B„(ti) - 2q(f,)B„(f,) , (83) 



where the approximation makes use of the fact that the normal- 
ization factor in Eq. ( fTO] ) is close to unity. 

5.2.5. Estimating tine excess attitude noise 

The excess attitude noise e„(0 introduced in Eq. ( [28] l accounts 
for modelling errors in the attitude representation. Such errors 
could be caused for example by (unmodelled) micrometeoroid 
impacts, 'clanks' due to sudden redistributions of satellite iner- 
tia, propellant sloshing, thruster noise, or mechanical vibrations 



(Appendix D.4 1. Due to the cubic spline representation, any lo- 



calized effect that cannot be fitted by the spline will result in 
systematic residuals that span over a few consecutive knot in- 
tervals. Indeed, discontinuities in the rate (e.g., from micromete- 
oroid impacts) or angle (e.g., from clanks) produce characteristic 
patterns of residuals that can be used to identify such events. A 
significant effort will be devoted to the possibly semi-manual 
and interactive process of finding these events. When identi- 
fied, they can be handled for example by modifying the knot 
sequence (Sect. 5.2.6 1. But even after this process, the model 
will be imperfect due to for example high-frequency thruster 
noise|^ Similarly, there will be a large number of impacts that 
are too small to be individually recognized; collectively they 
add some unmodelled attitude errors, which £a{t) may account 
for. However, it should be noted that ea{t) does not include any 



'High-frequency' here means roughly the range 1/2At ^ 0.03 Hz 
to 0.2 Hz, where At is the typical spline knot interval (~15 s); lower 
frequencies are absorbed by the spline and higher frequencies are 
smoothed out by the integration across the CCD. 
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component of the observation noise (principally from CCD pho- 
ton noise), nor is it an estimation of the attitude uncertainty (cf. 
SectlO). 

Three components of the excess noise, designated e/^i^it), 
eACp(f)> ™d eACF(f)^ need to be derived independently of each 
other, representing modelling errors in the AL attitude, the AC 
attitude of the preceding field of view, and the AC attitude in the 
following field of view. The three components are statistically 
nearly independent thanks to the way the attitude measurements 
are made, and the fact that the basic angle is not far from 90°. 

The algorithm to estimate ea(t) (for a - AL, ACP or ACF) 
is based on a simple statistical processing of the residuals Ri 
derived in the source updating. The time line f is divided into 
'buckets' [tj, tj+i) such that each bucket (j) will contain a suffi- 
cient number of observations, also in the AC direction. The size 
(duration) of a bucket should be several knot intervals for the 
attitude spline, but the boundaries tj need not in any other way 
be related to the attitude knot sequence. One set of buckets is 
needed for each attitude component (AL, ACP, ACF). Let I e ja 
signify that observation / belongs to bucket j and attitude com- 
ponent a. After having completed the source updating for all pri- 
mary sources, the excess attitude noise in bucket j is estimated 
as 



e^itj < t < tj^i) = max p, Fo.68(^/ - cr; 



leja 



(84) 



where Fo.esO is the 68% quantile (68th percentile) of the argu- 
ment values. It is important to note that the downweighting fac- 
tors w; determined during the source updating are not used here 
to eliminate possible outliers; this function is instead taken care 
of by using the quantile to compute a robust estimate of the typ- 
ical excess variance in the attitude bucket. This means that if 
the 'outliers' detected by the source update were actually caused 
by a stretch of bad attitude, then this will be recognized by a 



large value of the quantile in Eq. ( 84 1, and consequently by an 
increased e^. 

In the subsequent attitude update, the downweighting fac- 
tors wi are re-computed based on the residual from the previous 
source update but with a value for the total variance, (T^+cr^ + e^, 
using the newly estimated e^^. Thus, only the 'true' outliers - that 
are not due to the bad attitude - are now down weighted. The data 
may thus contribute to the attitude updating even if they had been 
flagged as outliers in the preceding source updating. 

The functions £a(t) are obviously an extremely useful diag- 
nostic for the progress of the AGIS iterations as well as (after 
convergence) for the quality of the attitude modelling and data. 
They can be plotted as a function of time, and the quantity of data 
is such that human inspection is feasible. They are also needed 
for setting the detection threshold for micrometeoroid impacts. 

The accumulation of statistics in the buckets is best done in 
parallel with the source updating, when the residuals are readily 
at hand. One remaining problem is how to compute the quantile 
in Eq. (|84]) in an efficient way, without having to store billions of 
residuals. Indeed, exact calculation of quantiles would require to 
store all the values Rf - cr^ - ef in a bucket before the quantile 
can be computed. However, if we are content with an approxi- 
mate estimate of the quantile, there are a number of sequential 
estimation algorithms available that only need to store a much 
smaller amount of data per bucket, see for example 'Greenw ald] 
|& Khanna (2001), [Gilbert et aT] ([200211 and references therein. 
We have chosen to use the Incremental Quantile estimation al- 
gorithm due to |Chambers et al.|p006j ). 



5.2.6. Initialization of the attitude parameters 

An approximate estimate of the attitude is already provided by 
the initial data treatment (IDT) preceding the astrometric solu- 
tion. This may be given as a discrete time series, for example 
one quaternion every second of time. The first time the attitude 
update is executed for a certain time interval, a regular knot se- 



quence is set up and the B-spline coefficients a„ in Eq. (lOi 
are determined by a least-squares fit. For a given time series 
of attitude estimates, this is a linear problem and therefore eas- 
ily solved. The resulting initial attitude a'"* is used in the first 
source update (S) and subsequently improved by the attitude up- 
date process (A) as part of the AGIS iteration scheme. 

By default, a regular knot sequence is adopted, i.e., the knot 
interval At - t„+i -t„ is taken to be more or less constant. Given 
the endpoints fbeg, fend of a data segment, the knots are set up at 
regular intervals respecting a given maximum value of At (of the 
order of 5 to 20 s). The assignment of knots must also take into 
account the need for anterior and posterior knots, as discussed in 
Appendix [B| and in the case of segmented data, the overlapping 
knots as discussed in Sect. 15.2.2] 

Occasionally the knot sequence needs to be redefined as a 
result of the adjustment process. Possible causes could be: 

- If the spline is not flexible enough to accurately model the 
data, it may be necessary to decrease the maximum allowed 
At. 

- Conversely, overfitting of the data may require the maximum 
allowed At to be increased. 

- Locally, a scarcity of accurate data or a short gap could make 
it necessary to remove some knots or introduce a natural 
break in the attitude representation (Sect. 5.2.2 1. 



Very locally, a bad fit may result from a micrometeoroid hit 
causing an almost instantaneous change in the angular ve- 
locity of the satellite. This may be dealt with by introducing 
multiple knots at the appropriate instants (Appendix D.4 and 

Ib31). 



Having redefined the knot sequence, it is necessary to re- 
initialize the spline coefficients a„, which must now refer to the 
new knot sequence. This is most easily done by evaluating q(f) 
for a regular time series, with a sampling interval much smaller 
than At (e.g., 1 s), and fitting the new spline to the time series. 

5.3. Calibration updating (C) 



The geometric instrument model is given in Sect. 3.4 We as- 
sume here the generic calibration model in Eqs. (|20|l-([2T]i, in 
which the parameters are indexed by rs. In the calibration update 
block (C) the calibration parameters c are improved by solving 
the third sub-system in Eq. ([38|, i.e., the normal equations 



dRi dR, 



dc dc 



d,^-Y^'^Ri{s,a,c,g)Wi. 



(85) 



The residuals in the right-hand side are computed from the pa- 
rameters values in the current or preceding iteration according 
to the discussion in Sect. 14.61 Because the calibration model is 
linear, the partial derivatives are uniquely determined by the ob- 
servation index /, 



dRi 



dCrs 



<1>„(0 if I e rs, 
otherwise. 



(86) 



In the normal matrix, the element with subscripts {rs)\ and {rs)2 
is non-zero only if there is at least one observation / such that 
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/ 6 (rs)i and / e (rs)2- Depending on how the caHbration param- 
eters are grouped into sets with no common observations (for 
example according to the CCD/gate combination; cf. Sect. |4.3| l, 
the normal matrix will therefore be block-diagonal, which the 
calibration updating takes advantage of in order to save compu- 
tations. It also facilitates distributed processing. 

Since the weights Wi are fixed from the preceding source 
and attitude updating processes, the update dc can be calculated 
in a single direct solution, using the robust Cholesky decomposi- 
tion (Appendix [C|. However, due to the degeneracy between for 
example the large-scale and small-scale AL calibration param- 
eters, this will produce an arbitrary feasible solution dc, which 
does not necessarily satisfy the constraints in Eq. ( 22 1. The con- 
strained update is obtained as 



dc^dc-C{C'C)^C'dc, 



(87) 



whereupon the updated c can be computed. 

The above-mentioned degeneracy among the calibration pa- 
rameters means that the normal matrix calculated according to 



Eq. ( 85 I is singular, which seems to contradict our assumption 
(Sect. 4.3 I that Ncc is positive-definite. However, if C spans the 
null space of Ncc, as it should for a properly formulated set of 



constraints, then it can be seen that Eq. (87i gives the same 
result as solving the updates with the modified normal matrix 
Ncc + ^CC , which is positive definite for any A Q. Thus, the 
procedure outlined above is equivalent to solving the constrained 
system with positive-definite matrix. 

5.4. Global updating 

An arbitrary number of global parameters may be solved for in 
the AGIS system. Global parameters should be defined in such 
a way that their default values, equal to zero, correspond to the 
baseline solution. By not solving for the globals, we implicitly 
set them to zero, resulting in the baseline solution. For example, 
we have a very high confidence in General Relativity, which in 
the parametrized post-Newtonian (PPN) formalism implies the 
parameter y - \. The global parameter related to the gravita- 
tional deflection of light should therefore not be y itself, but for 
example the parameter go in 



r = 1 + ^0 ■ 



(88) 



That is, go = corresponds to the baseline case of General 
Relativity. The global parameter vector is g = (gQ, g\, ...)'. 

The normal equations for the update dg to the global param- 
eter vector are 



dg^-Y^-^R,{s,a,c,g)Wi, 



dg 



(89) 



where the sums are taken over all the observations I, and the sta- 
tistical weights Wi follow from the preceding source and attitude 



updates. The partial derivatives in Eq. ( 89 1 are computed in exact 
analogy with Eqs. ([7T]i-([72| for the source updating, only with 
g replacing s,. The calculation of dui/dg' is not detailed here. 



In the simple iteration scheme (Sect. 4.5 i, the inclusion of 
go representing the PPN parameter y considerably slows down 
the convergence of the astrometric solution. As explained by 
Hobbs et al.| ( 2010[ l, this behaviour is caused by the relatively 
strong correlation between the gravitational light deflection by 
the Sun (proportional to 1 -H y, and directed away from the Sun) 
and trigonometric parallax (directed towards the solar-system 
barycentre, never far from the Sun). |Hobbs et al.] ( |20I0j ) found 



that the convergence rate could be restored by the introduction 
of a pseudo-parameter gi representing a global shift of all par- 
allaxes. (The update to this parameter is solved in each iteration 
but never applied - its value remains at zero and the converged 
values of all the other parameters are unaffected; hence the prefix 
'pseudo' .) It was later found that this artefact is not needed when 
using the conjugate gradients scheme, which gives roughly the 
same rate of convergence whether or not gi is included. 



6. Auxiliary processes 

In this section we describe some auxiliary processes that are not 
necessarily part of the astrometric solution as such, but neverthe- 
less needed in order to construct the astrometric catalogue. They 
concern the definition of the reference system for the source 
positions and proper motions by means of the frame rotator 
(Sect. [6T| i, the selection of primary sources (Sect. 6.2 1, and the 
computation of the standard uncertainties and correlations of the 
astrometric parameters (Sect.|6.3 1. 



6.1. Frame rotator 



As explained in Sect. 4.4 the measurement principle of Gaia 
results in a system of positions and proper motions that is essen- 
tially undefined with respect to an arbitrary (small) off'set in the 
orientation and spin of the reference frame. As a consequence, 
the normal matrix is in principle singular with a rank defect 
of 6. 

While the solution of rank-deficient problems in general 
requires special attention to the singularities, for example by 
adding constraints to avoid numerical instability, no such com- 
plication arises here because of the way AGIS works. Basically, 
a solution is found by iterating between the source and atti- 
tude updatings (the calibration and global updatings play no role 
here because they are to first order independent of the reference 
frame). When the sources are updated, the reference frame is in 
reality set by the (then assumed) attitude; similarly, when the at- 
titude is updated, the frame is set by the (then assumed) source 
parameters. In terms of the matrix formulation of Sect. 4.5 this 



is equivalent to the statement that the preconditioner K is non- 
singular. The end result is that AGIS converges to a solution with 
both the source and attitude parameters expressed in the same, 
but largely arbitrary, reference frame. 

The intention is however that the final source parameters 
(positions and proper motions) shall be expressed in a celes- 
tial reference frame that represents, as closely as possible, the 
International Celestial Reference System (ICRS). For consis- 
tency, it is moreover necessary that the attitude parameters are 
expressed in exactly the same frame as the source parameters. 
It is the task of the frame rotator to accomplish this. A similar 
process was used to align the Hipparcos Catalogue with the ex- 
tragalactic reference frame (Lindegren & Kovalevsk y|I995 1. 

In the following we start with the rigorous definition of the 
rotation correction, then derive a linear approximation applica- 
ble to the small corrections that we have in practice. Finally, we 
discuss the determination of the rotation parameters and their 
application in the AGIS iteration scheme. 

6.1 .1 . Relation between the ICRS and AGIS frames 

Ideally, the astrometric solution should result in parameters that 
are expressed in the BCRS (Sect. 3.1 1, whose axes are aligned 
with the ICRS here represented by = [Z F Z]. However, due 
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to the in principle undefined reference frame of AGIS, the as- 
trometric solution is in effect expressed relative to a slightly dif- 
ferent triad, which we denote C = [ZFZ]. The two reference 
systems, which for simplicity will be referred to as the ICRS and 
AGIS frames, are related by a time-dependent spatial rotation 
given by the quaternion f(r); thus the coordinates of the arbitrary 
(fixed) vector v are transformed according to the frame rotation 
formula 

{C'v, 0) = f(f)"' {C'v, 0) f(f) (90) 



(cf. Eq. A. 15 I. Due to the kinematical constraints of the AGIS 
solution, f(f) describes a uniform spin motion of the two frames 
with respect to each other. 



For consistency with Lindegren & Kovalevsky ( 1995 i we 
parametrize f(f) by means of two vectors s and (o represent- 
ing corrections to the orientation and spin of the AGIS frame. 
More precisely, the parameters of the frame rotator are the six 
coordinates of the vectors in the AGIS frame at some adopted 
frame rotator epoch ff, (not necessarily the same as the reference 
epoch fep of the astrometric parameters). These coordinates are 
denoted Sx, Sf, s^, cjx, cof^ ™d w^; according to our kinematical 
assumption they are strictly constant numbers. For the arbitrary 
epoch t the frame rotator quaternion is, therefore. 



f(f) = Q[(f - ffr)C'6>]Q(C'e) , 



(91) 



where Q is the function introduced by Eq. ( |A.12[ l. Equations (j90] 



and (91 1 provide the basis for the rigorous transformation of 



any data between the two frames, given the rotation parameters 

C'e — [Bx By Bz]' — i^x '^zJ - 

While the above expressions are strictly valid for arbitrarily 
large rotation parameters, we have in practice ||e||, ||(f - ffi-)(j|| < 
20 mas ^ 10"^ rad, at least in the final iterations of AGIS. 
This means that second-order terms are completely negligible 
(< 0.002 //as). To first order we have 



f(f) =^ Q[C'e + (t- ffr)C'w] , 



(92) 



and the vector part of Eq. ( [90| ) becomes, to the same approxima- 
tion. 
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ffr) OJx 
ffr) Wy 
ffr) Wz 



(93) 



6.1.2. Transformation of the astrometric parameters 

Let a, 6, fia*, fj-s be the position and proper motion parameters 
for a source as derived in AGIS, that is referring to C. For brevity 
we omit here the source index (/), and do not consider the par- 
allax TUi and radial proper motion firi which are independent of 
the frame orientation. In analogy with Eq. (|5]l we have the nor- 
mal triad [p q r] with respect to the AGIS frame, where r is the 
barycentric direction to the source at time fep, p - {X xr) and 
q - r X p; its coordinates in the AGIS frame are given by the 
columns of the matrix 



C' [p q r]^ 



-sina -sin 5 cos a cos 5 cos 5 
cos a -sin 5 sin a cos 5 sin a 
cos 6 sin S 



(94) 



At the source reference epoch fgp the direction cosines of r are 
related by the frame rotation in Eq. ( [9T] i; thus 



{C'r, 0) = f(fep)-' {C'r, O)f(fep). 



(95) 



From C'r = [r ^- r, r,]' the position parameters in the ICRS frame 
are obtained as 



a - atan2(r^,, , 6 = atan2 ^r,, ^Jrl~+~r^^ . 



(96) 



The transformation of the proper motion components is a bit 
more complicated, as they are expressed with respect to axes that 
are physically (slightly) different in the two frames, viz., p, q in 
the AGIS frame, and p, q in the ICRS frame. However, the time 
derivative (at epoch fgp) of the barycentric direction to the source 
is a fixed vector in space, known as the proper motion vector. In 
a kinematically non-rotating system it can be written 



(97) 



where the last term is the correction for the spin of the AGIS 
frame. The coordinates of the proper motion vector in the two 
frames, 

C'fi = C'piJa* + C'qus 



and 



(98) 
(99) 

(100) 



From Eq. ( [97| ) the proper motion components in the ICRS frame 
are then 



C'// = C'pfia* + C'qfis - (C'w) X (c'r) 
are related by a frame rotation analogous to Eq. ( [95] 

{C//, 0) = f(fep)-' {C'/Z, O)f(fep). 



p'n = {C'p)' (C'fi) 



q'fi^iC'qnC'fi) . (101) 



For given (a, 5,/i„»,/ia) and (C'e, C'a>), the sequence of cal- 
culations is therefore: 



1. Calculate f(fep) by Eq. (91 



2. Calculate C'[p q r] by Eq. (94i; 



3. Calculate C'r by Eq. (|95| and C'fi by Eq. ( (T00| ; 

4. Calculate a and 6 by Eq. (96 1 and C'p and C'q by Eq. (|5]l; 

5. Calculate yu^, and yUj by Eq 
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As we have not employed the approximations in Eqs. (|92 
these transformations are rigorous. 

6.1 .3. Transformation of tine attitude parameters 

In analogy with Eq. (j9| the attitude quaternion q(f) derived in 
AGIS defines the transformation from the AGIS frame C to the 
SRS S as a function of time; thus for the arbitrary vector v 



{S'v,0)=q(f)-MC'v,0)q(f). 



Solving {C'v, 0} and inserting into Eq. (90 1 yields 



{C'v,0} = f(f)-'q(f) {S'v,0} q(f)-'f(f). 



(102) 



(103) 



Comparison with Eq. (|9]l shows that the corrected attitude is 
given by 

q(f) = f(f)-'q(f). (104) 

In practice the AGIS attitude q(f) is expressed in terms of B- 
splines by means of coefficients a„ as in Eq. ( lOl. Th e res ult of 



the time-dependent transformation by f(f)"' in Eq. (104i can- 
not, in general, be exactly represented by means of B-splines. 
However, since the transformation is changing extremely slowly 
in comparison with the duration of the support of each B-spline 
(~ 1 min), and also the changes of a from knot to knot are very 
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small, we make a negligible error by transforming the coeffi- 
cients instead of the attitude quaternion. Thus we use 



where 



3/1 — f(fn) 3n 7 



(105) 
(106) 



is the time half-way through the support of B„(t). 



6.1 .4. Determination of tine frame rotator parameters 

The parameters C'e = [e^, Sf, and C'w = V^x^ , w^]' 
are determined by a weighted least-squares solution, using as in- 
put the differences in positions and proper motions, for a subset 
of the sources, between the AGIS results and a priori data. Three 
kinds of sources may be used for this purpose: 

- A subset 5nr of the primary sources can be assumed to de- 
fine a kinematically non-rotating celestial frame. Typically 
this subset will contain some 10^ to 10* quasars and point- 
like galactic nuclei, mainly identified from ground-based 
surveys and photometric criteria. This subset effectively de- 
termines (x>. 

- A subset 5p of 5nr in addition have positions (a,S) accu- 
rately determined with respect to the ICRS by means that 
are completely independent of Gaia. Typically it will contain 
the optical counterparts of extragalactic objects with accu- 
rate positions from radio interferometry (VLBI). Due to the 
cosmological acceleration effect described below it is neces- 
sary to assign an epoch fp to each such position. This subset 
effectively determines e. 

- The third subset 5 pm consists of primary sources that do not 
a priori belong to the non-rotating subset, but have positions 
and/or proper motions that are accurately determined with 
respect to the ICRS independent of Gaia. This could include 
radio stars observed by VLBI, or stars whose absolute proper 
motions have been determined by some other means. The as- 
trometric parameters of a source in this subset are denoted a, 
6, flat, pis and refer to the epoch fpM (the parallax is irrelevant 
here, as it is identical in both frames). It is not expected that 
this subset will contribute very significantly to the determi- 
nation of e and/or ai, but they are included in the discussion 
below since they may provide important consistency checks. 

In the following we derive the appropriate observation equations 
for the different kinds of sources. The derivation assumes that the 
frame rotator parameters are numerically small so that the lin- 
ear approximation in Eq. ( [93] l applies. For the (weighted) least- 
squares estimation of the frame rotator parameters it is, further- 
more, necessary to assign the appropriate statistical weights to 
the observations and to have procedures for identifying and han- 
dling outliers. These issues are however not discussed here. 

In the frame rotator solution, the six parameters must be 
complemented by three more parameters ax, ay, flz taking into 
account the acceleration of the solar-system barycentre in a cos- 
mological frame ( |Bastian|1995[ |Gwinn et al.|1997] |Kopeikin &| 
Makarov 2006 1. Such an acceleration, by the vector or, will cause 



a systematic 'streaming' (dipole) pattern of the apparent proper 
motions of extragalactic objects, described by 



^0 = (/ - rr')a . 



(107) 



Here r is the direction to the source and a - a/c to first order 
in c"', where c is the speed of light. The galactocentric acceler- 
ation of the solar-system barycentre by Hcfll ^ 2 x 10" 



expected to produce a proper motion pattern with an amplitude 
of ||a|| ^ 4 //as yr"', which Gaia should be able to detect given a 
sufficient number of quasars among the primary sourcesplThe 
additional parameters introduced in the frame rotator solution 
are the components of a in the ICRS, or [ax ay az] = C'a; they 
may be expressed in the same unit as the proper motions. 

Observation equations for a source in Sm- A kinematically 
non-rotating source should only have an apparent proper motion 
due to the cosmological acceleration. Equating fi in (97i with 
fiQ from Eq. ( 107 1 and taking the scalar products with p and q 



results in the two observation equations 

p'a + q'oj - pat 
qa - p'co = jls 



(108) 



where we have used the scalar triple product rul^^for the terms 
including (o. These equations are linear in the unknown accel- 
eration and spin parameters, and the coefficients are the known 
coordinates of p and q in either C or C (to the adopted approxi- 
mation the coefficients are the same in the two frames). 

Observation equations for a source in Sp. In order to com- 
pare positions it is necessary to choose an epoch at which to 
make the comparison. At the chosen epoch of comparison, f, the 
barycentric coordinate direction of the source is, to first order in 
the proper motion. 



Mb(0 = '•(fp) + (f - tp)fXo 

= r(tep) + (t- fep) iPiia* + qfls ' X r) 



(109) 



In the first equality we have made the assumption that the source 
has the apparent proper motion //q when observed in the ICRS 
frame; the second uses the same proper motion vector derived 
from the AGIS data according to Eq. ( [97| . If we now compute 
the coordinates of Hb(0 in C (using the first equality) and C 
(using the second equality), they must be related according to 
Eq. ( [93] l. Resolving the coordinate differences along a and S we 
obtain the observation equations 



(f - tp)p'a + q'{e + (f - ffr)w) = Aa* 
(f - tY-)q'a - p'{s + {t- tb)u) - A5 



(110) 



where 
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cos 6 cos a - cos 6 cos a 
cos 6 sin a - cos S sin a 
sin 5 - sin S 



+ (f-fep) 



p6 
(111) 

The observation equations in proper motion are of course the 
same as in Eq. ( 108| l. 

Returning to the choice of comparison epoch f, it is clear 
that the result in terms of the estimated frame rotator parameters 
should in principle not depend on this choice. However, that will 
only be the case if the statistical correlations among the data are 



" The expected acceleration due to, for example, the Andromeda 
galaxy or the Shapley Supercluster ( [Proust et al.| |2006') are at most 
~ 10"^ of the galactocentric acceleration. On the other hand, nearby 
massive galactic objects and large-scale deviations of the galactic po- 
tential from axisymmetry could conceivably produce a larger deviation 
in the direction of the total acceleration. 
a'{b xc) = b'(c xa) = c'{a x b) 
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taken into account in the least-squares estimation. Otherwise one 
should choose f to minimize these correlations. From Eq. ( 



it is seen that the right-hand sides of Eq. (108 1 and (110 1 are 
uncorrelated if f = fep, provided that the position and proper 
motion errors in AGIS are also uncoiTelated (which is generally 
the case if fep is appropriately chosen, i.e., close to mid-mission). 
Consequently we suggest using t - fep. 

Observation equations for a source in 5pm. Here it will be 
necessary to distinguish two cases depending on how the proper 
motions have been measured. If p^g are the proper mo- 
tion components of a source measured relative to background 
quasars, then the observation equations are simply 



pa + q (o^fia*- /^ff. 
q'a - p'cD = lis- fig 



(112) 



108 1 in case the mea- 



As expected, the equations simplify to Eq 
sured proper motion is zero. The observation equations obtained 
from the comparison of positions are the same as in Eq. ( llOi 
but with right-hand side 



cos 6 cos a - cos 6 cos a 
cos 5 sin a - cos 5 sin a 
sin 5 - sin S 
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-it- fp) 
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If, on the other hand, the proper motion is not measured rel- 
ative to the local extragalactic background, but in a global non- 
rotating frame, then it already includes a contribution from //q 
and the appropriate observation equations are obtained by delet- 
ing the terms depending on a: 



-p'cxi ^fls-jj-S 

q'{s + (t - tb)(o) - Aa* 
-p'{e + (t - tf,)(o) = A6 



(114) 



6.1 .5. Tine null space vectors 



In Sect. 14.41 we introduced the n x 6 matrix V whose columns 
span the null space of the normal matrix A^. For completeness 
we give here the explicit expressions for one such set of null 
vectors. Any small change in the unknowns x, by a linear com- 
bination of the columns in V, will leave the calculated residu- 
als unchanged. Applying the frame rotator for arbitrary e and a> 
obviously leaves the residuals unchanged, and we can therefore 
compute the columns of V as the partial derivatives of x with re- 
spect to the six frame rotator parameters. Since we are concerned 
with infinitesimal changes, the distinction between the AGIS and 
ICRS frames is no l ong er necessary. If V is partitioned similarly 
or y = [V'„ V'„, V'„ V']', we find by 



to X and b in Eq. (|31 
means of Eq. ( |1 14| l 
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where t = fep - ftv and we have omitted the source index / on 
the matrix elements. The order of the astrometric parameters is 



(ff*, 6, m, flat, Ps)- From Eqs. ( [92] l and \\Q5\ we similarly ob- 
tain for the attitude parameters a„ - [ux, Oy, a,, a„]. 
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where t„ = t„ - f^. The calibration and global parameters are not 
affected by the frame rotator, so = and Vg - 0. 

Let dhea vector of small changes to the unknowns x, as for 
example the update computed in one of the AGIS iterations. In 
some situations it is desirable to remove from d its component 
in the null space, i.e., to project it on the row space of A^. This 
will for example ensure that the orientation and spin of the AGIS 
frame is not, on the average, changed by the update. In principle, 
we could achieve this by a process analogous to Eq. ([87]i: 



d^d-V(V'V)^V'd, 



(117) 



where d is the projection of the update on the row space of A^. 
This is equivalent to solving the unweighted least-squares prob- 
lem Vz - d, yielding the orientation and spin components as 
Z = {V'Vy^V'd, followed by the subtraction of the null space 
component Vz- In practice, this can equivalently be achieved by 
means of the frame rotator, without the need to compute V ex- 
plicitly. 

6.1 .6. Role of the frame rotator in AGIS 

The frame rotator process consists of the three steps: (i) de- 



termine the frame rotator parameters according to Sect. 6. 1 .4 
(ii) correct the astrometric parameters for all sources according 
to Sect. 6.1.2[ (iii) correct the attitude parameters according to 
Sect. 6.1.3 In principle, this process only needs to be run after 
convergence of the AGIS iterations; nevertheless, there may be 
a case for running it after each AGIS iteration, preventing a pro- 
gressive deviation from the ICRS in the course of the iterations. 

In particular, for the simulation experiments described in 
Sect. |7] it was found necessary to run the frame rotator after 
each iteration, in order to be able to compare the results of each 
iteration with the 'true' astrometric parameters (used as input 
to the simulations). Without this coiTection, the differences be- 
tween the 'estimated' and 'true' parameters for the source posi- 
tions and proper motions would have been grossly distorted by 
frame errors originating from the starting values of the attitude 
parameters. In this case all the primary sources were treated as 
belonging to the subset 5pm, for which Eq. ( |114| i is appropriate. 

6.2. Selection of primary sources 

The astrometric core solution does not use all the sources ob- 
served by Gaia, but only a subset of them known as the primary 
sources. The selection of this subset is made iteratively, based 
on the results of earlier astrometric solutions and other processes 
such as double-star and variability analysis (not discussed in this 
paper). The main criterion for a primary source is that its proper 
direction, at all times when it is observed by Gaia, is adequately 



modelled by the standard astrometric model outlined in Sect. 3.2 
Examples of sources that should be excluded based on this cri 
terion are solar-system objects, short-period astrometric binaries 
with a significant size of the photocentre orbit, long-period as- 
trometric binaries with a significant curvature of the photocentre 
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orbit, certain unresolved binaries where one or both of the com- 
ponents are variable, and active galactic nuclei (AGNs) with sig- 
nificant variation of their photocentre positions. Variable stars, 
long-period resolved binaries, eclipsing and spectroscopic bi- 
naries need not be excluded a priori from the set of primary 
sources, although many of them are potentially problematic from 
an astrometric modelling viewpoint (e.g., a variable star might be 
part of an unresolved double or multiple star, resulting in a vari- 
able photocentre position). On the other hand, partially resolved 
double/multiple stars and other extended sources will be prob- 
lematic even if their photocentres strictly adhere to the basic as- 
trometric model, Eq. (|3]l, because technically the determination 
of the photocentre becomes more difficult and less precise. 

For the calibrations there are other requirements on the pri- 
mary sources, in particular that there are enough of them at var- 
ious magnitudes and colours, while their sky distribution is less 
important. Securing a sufficient number of primary sources for 
the calibrations will tend to include many more primary sources 
in some areas, such as the galactic plane, resulting in a very non- 
uniform distribution across the celestial sphere. 

6.2.1 . The number of sources needed for AGIS 

The number of sources required for the Astrometric Global 
Iterative Solution is driven by the calibration needs of having 
representative numbers of sources of different magnitudes, and 
the attitude needs of having a sufficient number of sources in 
every knot interval. 

Let us consider first the requirements for the geometric 
small-scale calibration of the CCDs (Sect. |3.4[ ). The angular ex- 
tent of a single CCD in the across-scan direction is about 0.1°. 
It scans the celestial sphere at a rate of 1° min"' and there- 
fore covers about 2000 deg^ per week, if both fields of view are 
counted. Since there are 1966 pixel columns across the CCD, we 
have the convenient rule of thumb that each pixel column covers 
1 deg^ per week. Thus, if the average density of suitable primary 
sources is D deg"^ and we require a minimum of observa- 
tions of such sources per pixel column for its calibration, then 
the minimum time needed is N/D weeks. For example, with 10*^ 
primary sources we have D = 2400 deg"^, and it is then reason- 
able that the small-scale calibration can be made, at a resolution 
of one or a few pixels, in a matter of weeks. However, for the 
gated observations of bright sources (G < 12 mag), the available 
numbers are much smaller and it will be necessary to sacrifice 
resolution in time, number of pixels, or both in order to have a 
sufficient number of observations per calibration cell. For exam- 
ple, the gate used for the brightest sources will perhaps mainly 
be used for G - 5.7 to 8.8; the total number of such stars is 
about 130000 (from the Tycho-2 Catalogue; p0g et al.|2000, ) or 
Z) ~ 3 deg"^, of which only a fraction will be suitable as pri- 
mary sources. Thus, even over the whole five-year mission there 
will only be a few hundred observations per pixel column. From 
these considerations it is clear that as many as possible of the 
bright stars should be selected as primary sources. 

Consider next the requirements for the attitude determina- 
tion. The minimum reasonable knot interval is of the same or- 
der as the transit time over a CCD, or about 5 s. Since we re- 
quire both along-scan and across-scan measurements, and the 
latter are normally only provided by the SM and some of the 
AFl observations, we assume conservatively one observation in 
each coordinate per field-of-view transit. There are seven CCDs 
across the width of the field of view; the area scanned is there- 
fore 1000 deg^ per day in each field of view, or 0.06 deg^ per 
5 s interval. If we require, say, 100 transits per knot interval for a 



reliable attitude determination, then the minimum density of pri- 
mary sources is D = 1700 deg"^, or some 70 million sources in 
total. To achieve this density in the galactic pole regions requires 
that stars as faint as G - 19 are included among the primary 
sources. Thus it is clear that the design aim of ^ 10^ primary 
sources is quite reasonable, and that these will have to include 
both very bright and very faint stars, as well as many quasars 
down to G = 20 for the extragalactic link. 

Apart from these minimum requirements, it must be main- 
tained that the quality of the solution will only improve, the 
more (good) primary sources are included. Stated in the neg- 
ative sense, the solution quality cannot improve by removing 
good-quality primary sources. From this viewpoint one should 
aim to include as many primary sources as possible in the final 
solution. 

On the other hand, one should not forget that it is possible to 
run AGIS with much fewer primary sources by disabling short- 
term small-scale calibrations and using longer knot intervals for 
the attitude. This will increase modelling errors, which however 
is acceptable for initial runs where the input data have not yet 
been properly calibrated. 

6.2.2. Selection criteria 

As outlined above a source has to pass several tests, derived from 
different processes, in order to qualify as a primary source. The 
most important test is derived from the AGIS solution itself, and 
is based on how well the standard astrometric model (Sect. |3.2| i 
fits the data. However, if initially we want to limit the number of 
primary sources, a somewhat more sophisticated selection pro- 
cedure is needed to guarantee the minimum requkements. 

Each source carries an attribute representing the 'relegation 
factor' U, which is a floating-point number ranging from about 
1 to infinity. U ^ I implies the source is perfect for use in the as- 
trometric solution, while successively larger values indicate less 
suitable sources. The relegation factor may incorporate the re- 
sults of several different tests, and therefore provides a continu- 
ous variable for use in the selection process. The name derives 
from the need to 'relegate' a primary source into a secondary 
one when U exceeds a predefined value, which may happen for 
example in the course of the AGIS iterations, or from one so- 
lution to the next. On the other hand, a secondary source may 
be promoted to a primary if its U value decreases below the set 
threshold. This suggests that all potential primary sources should 
be processed through the source update (Sect. |5.1| l, after which 
its status as primary/secondary may be decided. 

The excess source noise e,- estimated during the source up- 



dating (Sect. 5. 1.2 1 may be a good starting point for calculating 
the relegation factor, e.g.: 



U, 



,-7 



1 + [€i/e(Gdf 



(118) 



where e(G) is a normalization factor depending on the magni- 
tude. The choice of the function e(G) determines the balance be- 
tween absolute and relative contributions to the modelling error 
budget. With the choice e(G) - crf-^ (the formal along-scan ob- 
servational standard uncertainty for a source of magnitude G), 
Ui approximates the RMS normalized residual of the source. 
Selecting sources based on this f/,- tends to discriminate against 
bright stars where modelling errors may dominate over photon- 
noise errors. On the other hand, choosing e(G) = constant means 
that only sources with the smallest e, are accepted; this may re- 
move too many of the faint sources, where e, is still small in com- 
parison with crf^. A reasonable compromise between these two 
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extreme cases should be found. The value from Eq. ( |1 18[ ) can 
later be combined with other factors indicating for example pho- 
tometric variability, or some other potentially problematic prop- 
erty, so that in general the relegation factor can be determined 
with some formula using a combination of the source attributes. 

Apart from the relegation factor, which indicates whether a 
source is at all suitable as a primary source, the general principle 
should be to maximize the total weight of the primary sources. 
The weight of a source / is defined as 



Wl 



(119) 



where the average is taken over the n, - Y^iei 1 accepted AL 
observations of the source. Note that we do not use the more 
obvious definition W, = Y^iei with Wj from Eq. ( [62| , since 
we do not want to penalize a source by the excess attitude noise 
in some of its observations, nor favour a source because it is 
observed many times due to the scanning law. 

Having defined the relegation factor and weight per source, 
the selection of primary sources can be made to maximize the 
total weight with due regard to sky uniformity (for attitude de- 
termination) and magnitude distribution (for instrument calibra- 
tion). A possible procedure is the following. 

We start by specifying the minimum density Dmin (deg^^) 
required for the attitude determination, the targeted total number 
Mot of primary sources, with A^tot ^ (129600/7r)Dniin, and the 
maximum acceptable relegation factor C/max- For the geometric 
calibration of gated observations we may also specify minimum 
numbers {A^^) for several intervals in G. Then: 

1. Using a coarse-grained tessellation of the celestial sphere 
(see below), select in each pixel the A^^, sources with the 
largest W,- that satisfy [/,■ < t/,nax, where Np is the minimum 
number per pixel that will ensure D^i^- 

2. For each magnitude bin with a required minimum number 
A'^, count the actual number of primary sources already se- 
lected; if it is less than Ng add sources with [/, < f/,nax based 
on Wi. 

3. If the total number after Step 1 and 2 is less than A^,ot, add 
sources with Uj < Umax based on W,. If the total number 
exceeds A^tot, reduce Umax and repeat the process. 

If the required number cannot be reached in a particular pixel or 
magnitude bin, then it is necessary to increase t/max locally for 
that pixel or bin. 

For the tessellation in Step 1 in principle any reasonable way 
to divide up the sphere into cells of approximately the same area 
could be used, but for statistical operations the HEALPix scheme 
(|G6rski et al. |2005 i has some advantages (O'MuUane et al. 
20011). The cell size is determined by the choice of HEALPix 
parameter NSIDE and is important as it predicts the level of 
homogeneity over the sphere. A cell area of about 1/3 of the 
field of view might be close to optimal, and is achieved with 
NSIDE = 128 yielding 196608 cells. 

For the first run with real data some selection must be made 
using the initial star catalogue. In this case the relegation factor 
may be set to 1 for all sources and the selection based entirely 
on their spatial distribution and magnitudes. This will reduce 
the input to the first run of the astrometric solution, after which 
the relegation factor will be updated as described above. The 
secondary-source update step runs on all sources not included 
in AGIS; hence this will set a relegation factor for all sources 
observed by Gaia. 



6.3. Computation of standard uncertainties and correlations 

It is mandatory that the catalogue of astrometric parameters re- 
sulting from the astrometric core solution includes complete and 
reliable information about the expected error statistics. The most 
important quantity is the estimated standard uncertainty of each 
astrometric parameter. However, the statistical correlation be- 
tween the different astrometric parameters - both between the 
different parameters of the same source and between the param- 
eters of different sources - is also important and should be quan- 
tified. Such correlations are produced both by attitude modelling 



errors (Sect. 5.2.5 i and the statistical uncertainty due to the finite 



astrometric weight of the sources contributing to the attitude de- 
termination. More generally, we need a method to estimate the 
5x5 covariance matrix Cov(s,, s j) of the astrometric parameters 
of any two sources /, j (including the case i = j). In princi- 
ple, these are sub-matrices of the upper-left n , x ris part of A^' , 
the pseudo-inverse of the complete normal equations matrix in 
Eq. ( [30l l. (The pseudo-inverse should be used since the matrix is 
singular.) Although there are methods to compute selected ele- 
ments in A^' that may be feasible even for a system as large as 
this, it is utterly impossible to produce any significant fraction of 
the covariances by a direct computation. Instead, it will be nec- 
essary to rely on approximations and statistical estimates. A first 
approximation is obtained by ignoring the statistical uncertainty 
contributed by the errors of the attitude, calibration and global 



parameters; in this case we can ignore all parts of A^ in Eq. (31 
except Nss and find 



Cov(.„.,)= (A^tg n -J" J (120) 

^ I iii ^ J, 



where the inverse of [A^jj],/ = A'.WiAi is regularly computed as 
part of the source updating using Eq. pT] ). However, in this ap- 
proximation we clearly cannot estimate the covariance between 
sources (/ ^ j), which is unacceptable; moreover, we underes- 
timate the within-source covariance (/ - j) because of the ne- 
glected attitude and calibration errors. Refining this estimate is a 
very important problem which however is addressed elsewhere 
Poll etal.|20T0l[20T2l l. 

A somewhat related problem is the need to be able to trans- 
form the astrometric results, without loss of information, to an 
arbitrary epoch different from the fen used in the astrometric so- 
lution. The standard model in Eq. Q allows the astrometric pa- 
rameters to be transformed in a completely reversible manner, 
based on the assumption of uniform space motion relative to the 
solar-system barycentre. In this process it is necessary to include 
the sixth astrometric parameter even if it is (partially) de- 
rived from a spectroscopic radial velocity. Similarly, the trans- 
formation of the covariance matrix must consider all six param- 
eters. The relevant formulae are given in Vol. 1, Sect. 1.5.5 of 
The Hipparcos and Tycho Catalogues ( ESA||1997 1 and are not 
repeated here[^Indeed, the normal equations for the six param- 
eters contain the full information of the Gaia observations of a 
particular source with respect to the standard astrometric model, 
and for this reason it is desirable to compute and store these nor- 
mals even if only a subset of them is used in the actual source 
update (cf. Sect. 



5.1.31. 



The sixth parameter yu, is denoted ^ in ESA ( 1997 1. 
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The implementation has been briefly outhned in|Lammers et al 



( [200 9') and more comprehensively in O'Muflane et al. ( |2011 
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Fig. 8. Simplified architectural design diagram of AGIS. The 
rounded boxes are independent Java processes running in par- 
allel on different nodes of a multi-CPU processing cluster. The 
arrows indicate main data exchanges between the various pro- 
cesses. Input/output-related data flow from/to the storage system 
is not shown, and likewise some important but conceptually ir- 
relevant interactions between some elements (e.g., the Servers 
and the RunManager). 



7. Software implementation and demonstration 
solutions 

The practical reahsation of the AGIS scheme as outlined in the 
preceding sections is contained in software that is being devel- 
oped, since early 2006, jointly by teams at the European Space 
Astronomy Centre (ESAC) and Lund Observatory within the 
Gaia Data Processing and Analysis Consortium (DPAC). It is a 
central software module embedded into a complex, overall data 
processing system ( |0'Mullane et al.|2007j l, whose ultimate goal 
is the creation of the final Gaia catalogue with a targeted release 
date around 2021. 

This section gives a concise overview of the architectural 
design of the AGIS software (Sect. |7.1| i on the one hand, and 
on the other presents some selected results from a recently 
(June-November 2011) conducted large-scale astrometric solu- 
tion (Sect. 7.2 1 using as input simulated data for more than 2 mil- 
lion sources. While this number is still a factor 50 smaller than 
the number of primary sources foreseen in the final AGIS runs 
(around 2018-2020), and the present simulations are much sim- 
plified especially with respect to the attitude modelling, we be- 
lieve that this proof-of-concept run demonstrates the practical 
validity and correctness of the key theoretical concepts described 
in this paper Future tests will involve simulated data sets that 
are both larger and more realistic, with a parallel further devel- 
opment of the algorithms and software in view of the added new 
complexities. 



7. 1 . AGIS software overview 

The term AGIS is subsequently often used to refer to the actual 
software implementation of the scheme. Like virtually all Gaia 
data processing software, AGIS is entirely written in the object- 
oriented Java programming language ( |0'Mullane et al.||2010| l. 



to 

which the interested reader is referred for more details. In this 
section, some key classes are briefly described; following Java 
naming conventions their names (given in italics) are concate- 
nated capitalized nouns, as in RunManager. 

Owing to the number of sources and the associated large 
data volumes that have to be handled (see Sect. [TJ it is clear 
that a well-performing system must be distributable on mod- 
ern, multi-node, multi-core processing hardware environments 
and make optimal use of parallelism as far as permitted by the 
AGIS scheme. Another elementary consideration is that inher- 
ently slow disk input-output operations should be minimized and 
never allowed to be a bottleneck for any of the computing pro- 
cesses. 

These basic requirements have led to the system schemati- 
cally depicted in Fig. [8] with its main components and data flow. 
The central elements are DataTrains, Servers, a RunManager, 
and a ConvergenceMonitor. When an AGIS run starts the 
RunManager splits the entire processing task of the first itera- 
tion into separate and independent jobs which are then taken and 
executed in parallel by DataTrains that have been started on the 
different CPUs of the processing system. Each such job involves 
the processing of all observations for a group of sources (with 
typically 100-1000 sources per group), which is done by loop- 
ing over the sources, one at a time. Before the loop is entered all 
the data that are needed for the processing (observations, as well 
as all source, attitude, and calibration data) have been loaded 
into memory and are passed on to the core algorithms. This is 
a key design aspect which, together with a suitable grouping of 
the data on the storage system, ensures that the system is never 
input/output limited and that it has a constantly high utilization 
of the CPUs (typically at flie 90% level). 

Each AGIS iteration starts with the updating of the respec- 
tive sources (see Sect. \5.\\ . Computed provisional updates are 
then written to the storage system and, finally, the observations 
together with the updated source data are sent to attitude, cal- 
ibration, and global servers via the CPU-interconnecting net- 
work. The servers themselves are distributed in different ways 
(see, e.g.. Sect. 5.2.2 for attitude), but all are similar in that 
they accumulate normal equations by adding observation equa- 
tions as outlined in Sects. 5.2.1 5.3 and 5.4 respectively. When 



all DataTrain jobs are finished, the RunManager signals to the 
servers that all observations have been sent. This triggers that 
a Cholesky decomposition (Appendix |C]l is made of the accu- 
mulated normals matrices in every server, that the partial results 
on different servers are combined where necessary (e.g., the at- 
titude segments are joined), and finally that the results are per- 
sisted in the storage system. That marks the end of an iteration. 
Subsequent iterations are then started in the same manner as the 
first one, viz., through the creation of a set of processing jobs. 

Iteration k uses the computed updates from iteration k- 1 and 
generates updated parameters for use in the following iteration 
k + \. This progress is monitored by the ConvergenceMonitor 
through the accumulation of a selected list of statistical quanti- 
ties in the form of graphical plots (e.g., histograms of the up- 
dates of all the astrometric parameters) accessible in real time 
through a web interface. Naively, one may expect that the sys- 
tem can be considered converged if the updates become smaller 
than a pre-defined limit; however, finding an unambiguous and 
automatically verifiable convergence criterion has proven to be 
a surprisingly complex problem (see Sect. 4.4 in Bombrun et al. 
|201 Ip . We believe now that human inspection is indispensable 
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Table 1. Characteristics of the simulated input data and demon- 
stration solution. 



Quantity 



Value 



Duration of science mission 
Number of sources 

Number of along-scan (AL) observations 
Number of across-scan (AC) observations 



5.0 yr 
2 256222 
1.625 X 10' 
1.805 X 10** 



Standard uncertainty per AL / AC observation (representative): 

G< 13 92 ^as/ 520 //as 

G = 15 230yuas/ 1350 pas 

G = 17 590 yuas / 4000 pas 

G = 18 960 pas/ 7600 pas 

G = 19 1600 pas / 16000 pas 

G = 20 2900 pas / 38000 pas 

Number of astrometric parameters 1.128 x 10' 

Number of attitude spline knots 6.575 x 10' 

Number of attitude parameters 2.630 x 10' 

Number of calibration parameters 7 812 

Number of global parameters 1 



to assess the convergence status of the system reliably and, ulti- 
mately, decide on the termination of the iterative loop. 

An important feature of the RunManager is the ability to use 
different algebraic solution methods by selecting among differ- 
ent available iteration schemes. The description above explains 
the 'kernel' computation of provisional updates to the unknowns 
employing the Gauss-Seidel-type preconditioner approximation 
(Eq. ~ 



53 in Sect. 4.6 1 to the full normal matrix of the system 



(Sect. 4.5 1. How these provisional updates are actually com- 
bined at the end of an iteration to form the final updates to the 
unknowns depends on the chosen iteration scheme. AGIS can 

, the simple itera- 



use all four schemes outlined in Sect. 4.7 



VIZ., 



tion (SI), accelerated simple iteration (ASI), conjugate gradients 
(CG), and hybrid scheme (A/SI-CG). The previous description 
essentially refers to SI; in the other schemes there are a few ex- 
tra steps which however are immaterial for understanding the 
software system. 

AGIS is controlled through a list of a few hundred key- 
value parameters ('properties') which are configured before a 
run starts. Examples are: the numbers of servers and threads, 
the size of DataTrain }ohs, the starting values for the unknowns, 
and the employed solution method. Also, which update blocks 
are active during a run is controlled via properties. Any com- 
bination that involves at least a source update is possible, e.g., 
SACG, SA, SCG. 

The optimum number of data trains in a run is a complex 
trade-off between the available number of CPUs and memory, 
usable network bandwidth (more trains create more inter-CPU 
traffic), and the given maximum storage system throughput. By 
design of the system, the run time should scale inversely with the 
number of data trains (assuming there are enough CPUs), i.e., 
doubling the number of DataTraim should halve the run time. 
In the tests done until now, the run times are very satisfactory; 
however, more work remains for achieving the desired optimal 
scaling behaviour A first AGIS run using simulated data for a 
5 yr mission with 50 million primary sources was successfully 
completed in June 2011. This being only a factor 2 less than 
the baseline 100 million sources envisaged in the final AGIS run 
towards the end of the mission, it marks an important milestone 
in the development of the operational system. 





^ ,^ '"^^Wk* 




E 10000 



Fig. 9. All-sky projections of (from top to bottom) the total stel- 
lar density in the input data to the demonstration solution, the 
number of AL observations per source, and the resulting spatial 
density of AL observations. These and all subsequent sky maps 
use the Hammer-Aitoff equal-area projection in equatorial coor- 
dinates, with a running from -180° to -1-180° right to left. Top: 
The simulated sky contains some 2 million single stars cover- 
ing the Gaia magnitude range 6 < G < 20. The density ranges 
from less than 1 deg"^ around the galactic poles to a maximum 
of about 4800 deg"^ near the galactic centre in the bottom-right 
quadrant of the map. Middle: The number of along-scan obser- 
vations per source reflects the scanning law of Gaia, which is 
roughly symmetric around the ecliptic plane and gives an over- 
abundance of observations at ecliptic latitudes +45°. Bottom: 
The combination of the source density and the scanning law 
gives the displayed density of along-scan observations. 



7.2. Demonstration solution 

7.2. L Data simulation and model assumptions 

Since the start of the development in early 2006, AGIS has 
been tested continuously using simulated datasets of varying 
complexity and size generated by the Gaia System Simulator 



(Masana et al. 2005 i created by DPAC's dedicated coordina- 



tion unit for Data Simulations (CU2; Mignard et al.|20(j8 Luri 



& Babusiaux 2011). In the following we present the results of 
a test solution using 5 years of simulated astrometric observa- 
tions for about 2 million well-behaved (single) stars with a re- 
alistic distribution both in magnitude and coordinates, based on 
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Fig. 10. Map of the parallax errors in iteration 1 . The iterative 
astrometric solution starts off with spatially correlated errors in 
the astrometric parameters, generated as described in the text. 
These (initial) parallax errors have amplitudes of a few tens of 
mas. The number at the top-right corner of this (and subsequent) 
maps is the maximum value of the displayed range in yuas. 



the Besangon galaxy model ( jRobin et al.||2003] l. Figure [9] (top) 
shows the spatial source density of the data set in equatorial co- 
ordinates. Of particular interest for the AGIS run is the stark 
density contrast between ~ 1 and 5000 deg"^ mainly depending 
on galactic latitude, resulting in similarly high ratios in the total 
astrometric weight of the sources in Gaia's two fields of view. 
In Bombrun et al. ( |201 1[ ) it was shown (using simulations on 
a smaller scale) that a high weight contrast tends to reduce the 
convergence rate of the astrometric solution compared to a situa- 
tion where the weights are more balanced; however, the solution 
always converges to the correct solution provided that enough 
CG iterations are carried out. We will show that this key result is 
confirmed in the demonstration solution. 

The input data were generated using a fuUy-relativistic 
model of the observed (proper) directions M,(f) in Eq. (jTjl, includ- 
ing gravitational light deflection for PPN parameter y = 1, as- 
suming the Nominal Scanning Law (Sect. 3.3 i, the nominal geo- 
metrical instrument model (Sect. 3.4 1 and nominal performance 



of the instrument (in particular the centroiding accuracies cr^^, 
cr^"-- as functions of G; see Tablefl]). However, in order to test the 
capability to recover a varying Basic angle, a step-wise pertur- 
bation was introduced corresponding to the sinusoidal variation 
of AFj in Eq. (fT9]l with a period of 2.5 yr and an (unrealisti- 
cally large) ampntude of 500 /vas. In the astrometric solution, the 
large-scale AL calibration interval was set to 30 days, matching 
the step width of the perturbation signal. The solution used only 



one global parameter, viz., go = y - ^ as described in Sect. 5.4 
Table [T] lists some statistics of the data and solution, while the 
middle and bottom maps in Fig. [9] show how the number of ob- 
servations varies across the sky. 

Not all elements of the numerical algorithms described in 
this paper have as yet been integrated into the running software 
system which otherwise implements the basic model described 
in Sect. [3] In particular, the estimation of source and attitude 
excess noise (Sects. |5.L2| and |5.2.5| l were not activated in the 
present solution. This was not a problem for the demonstration 
run, as the applied observation noise is well-behaved, purely 
Gaussian. Further development of the AGIS software will to a 
large extent focus on making the solution robust against all kinds 
of unexpected input data. 

The demonstration solution was run with all four update 
blocks (S, A, C, G) enabled, using starting values for the atti- 
tude, calibration and global parameters that were erroneous on 
the mas level. (As explained in Sect. |4.5| and footnote |9] the re- 



sults of the subsequent iterations are independent of the initial 
values for the source parameters, because the updating always 
starts with the source block.) These initial values were created 
by adding Gaussian, uncorrelated errors to the true attitude pa- 
rameters, with a standard deviation of 50 mas, using the nominal 
calibration parameters (i.e., excluding the sinusoidal modulation 
of the basic angle), and a value of go = y - 1 = 0.1 (Sect. [54| . 
An attitude knot interval of 240 s was used in order to have a 
sufficient number of observations per degree of freedom even at 
the galactic poles. This interval is short enough that the attitude 
splines are able to represent the true attitude (i.e., the analytical 
nominal scanning law) with an RMS error of less than 9 //as. 
Although this is larger than the modelling errors aimed at in the 
real data analysis, it is sufficiently small in comparison with the 
typical attitude estimation errors (> 20 yuas; see Fig. 1 1 1 to have 
a negligible impact on the overall astrometric accuracy of the 
present solution. 

During the source update in the very first iteration, the ini- 
tial errors in the attitude, calibration parameters and y propagate 
to the sources, creating astrometric errors of a few tens of mas 
(Fig. 10 1 that are spatially correlated on a scale comparable to 
the attitude knot interval (~4°). These errors are quite hard to 
remove in subsequent iterations, but may be representative of the 
situation encountered by AGIS when processing the real mission 
data based on a fairly uncertain initial attitude and instrument 
calibration. 

With these starting conditions, 135 iterations were car- 
ried out using the conjugate gradients (CG) scheme. A re- 
initialisation of the CG scheme was made after the first 40 it- 
erations to avoid the development of a much slower convergence 
phase observed in some previous runs; after this, no further re- 
initialisation was made. At iteration 135 the typical updates of, 
for example, the parallaxes and the along-scan attitude were at 



or below a level of 5 x 10 /vas (Fig. 



11 



above the numerical noise floor set by the double-preci sion arith 



This is still slightly 



metic (~ 10" rad or ~ 10" yuas). As discussed by Bombrun 
|et al. ( 201 I| l, truncating the iterations before the numerical noise 
floor has been reached implies the presence of spatially corre- 
lated 'truncation errors' having an amplitude of a few times the 
typical updates, or ~ 10" yuas in the present case. We now pro- 
ceed with a more detailed analysis of the results. 



7.2.2. Source results 

In this section we focus on the results for the parallax (tu), which 
is arguably the most interesting parameter from an astrophysical 
viewpoint; moreover, as a scalar quantity independent of epoch 
and reference frame, its statistics can be summarized compactly 
and without ambiguity. However, the behaviour of the position 
and proper motion parameters is qualitatively similar, and some 
results are given in Table|2]and Fig. [14] 

The top diagram in Fig.[TT|shows the typical sizes of the er- 
rors and updates in parallax versus iteration number, as measured 
by the Robust Scatter Estimat^^(RSE). The parallax errors set- 
tle relatively quickly (around iteration 25) at a level of 146 yuas 
and remain stable till the end of the run with updates becoming 
successively smaller, reaching the level of 10"^ fias around itera- 
tion 120. The actual error distribution is symmetric but strongly 
non-Gaussian (in fact more like a Laplace distribution) due to 



The RSE is defined as 0.390152 times the difi'erence between the 
90th and 10th percentiles of the distribution of the variable. For a 
Gaussian distribution it equals the standard deviation. The RSE is used 
as a standardized, robust measure of dispersion in CU3. 
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Fig. 11. Top: Evolution of the typical parallax error (crosses) 
and parallax update (circles) as functions of the iteration number 
The typical error settles at around 146 yuas. Bottom: Evolution of 
the typical attitude error (crosses) and update (circles) as func- 
tions of the iteration number, for the three principal SRS axes 
X (red), y (blue), and z (black). The errors settle at around 167, 
224, and 20 //as, respectively. In both these plots the typical er- 
rors and updates are given by the Robust Scatter Estimate (RSE), 
similar to an RMS value (see footnote [Ts]). 



the variation of star numbers and observational standard uncer- 
tainties with magnitude (cf. Tables[T]and|2]i. The overall parallax 
error RSE of ~ 150 //as is therefore representative for the median 
magnitude, G ^ 19, in good agreement with current accuracy 
predictions (! Lindegren|20 1 0) . The overall sizes of the errors and 
updates shown in Fig.|ll| should however be complemented by 
more detailed statistics as functions of coordinate and magni- 
tude. 



Figure 12 shows the spatial distribution of the parallax errors 
at a few selected iterations. The left column shows the median 
error in each cell, while the right column shows the median ab- 
solute value of the error These quantities serve as robust proxies 
for the mean and RMS values (the RSE is not used for the lat- 
ter as many cells have too few sources for this measure), and 
therefore may suggest the levels of systematic and random er- 
rors as function of position. Figure 13 shows the corresponding 
maps for the parallax updates. After a few iterations, when the 
overall parallax errors are already below the 1 mas level, very 
significant systematic (i.e., spatially correlated) errors of a few 
mas remain, especially in the high-density areas of the galactic 
plane. These are damped in the subsequent iterations, but still 
remain at a level of several hundred //as in the galactic centre 
region around iteration 20, when the overall parallax errors start 



to settle at their final value according to Fig.[TT| At iteration 135 
the regional errors have virtually disappeared, and the error map 
shows a characteristic pattern with the solution being seemingly 
better around the galactic equator than in the polar regions. This 
is purely an effect of number statistics: in the galactic pole areas 
the cells contain rather few sources (often just a single source, in 
which case the displayed values are simply the individual paral- 
lax errors), while closer to the galactic plane the scatter from one 
cell to the next is reduced by the median-averaging over many 
sources. 

The sequence of error maps for iterations 5, 20, and 135 cor- 
roborates a key result of Bombrun et al.| (j201 1|), viz., that by 
iterating long enough the system can cope with large spatial im- 
balances in the astrometric weights, provided that the minimum 
source density allows a good attitude determination at all points. 

The median absolute values of the parallax errors shown in 
the right column of Fig.[T2|quickly settle in a large-scale pattern 
that mainly reflects the expected variation of parallax accuracy 
with ecliptic latitude (see, e.g.. Table 3 in Lindegren|2010[ l. This, 
in turn, depends on the scanning law, i.e., on a combination of 
the number of observations per source (see Fig. [9] middle) and 
the geometric configuration of the scans - for example, the over- 
density of observations at +45° ecliptic latitude does little to im- 
prove the parallaxes, which are then mainly producing shifts in 
the AC direction, while Gaia is primarily sensitive to the AL 
displacement. Number statistics reduce the between-cell scatter 
of the median absolute values as well, which accounts for the 
smoother appearance along the galactic equator 



The update maps in Fig. 13 conform with expectations and 
the preceding discussion. Of particular interest from a diagnos- 
tic viewpoint is the observation that the amplitude and spatial 
distribution of the median updates in the non-converged solution 
give a fair indication of the (systematic) truncation errors. This 
is obviously useful for assessing the state of convergence, as the 
update maps can be constructed for the real mission data as well 
(whereas the error maps are of course unknown). For example, 
based on the median updates in iteration 20 (middle left map of 
Fig. 13 1 one might correctly conclude that truncation errors of 



a few hundred //as remain, especially in the galactic centre re- 
gion, as shown in the middle left map of Fig. 12 By the same 



reasoning it appears that truncation errors at iteration 135 should 
be well below 1 //as. 

In tests using other datasets with lower density contrasts we 
have seen a more rapid convergence. A similar slowdown in con- 
vergence for a non-uniform weight distribution was observed 
and discussed in Bombrun et al.| ( |201 1) based on small-scale 
simulations. A likely mechanism for this slowdown is related to 
the weight contrast problem discussed by Ivan Leeuwen (2005 1 
in connection with the new reduction of the Hipparcos data. Any 
of the fields of view scanning through a high-density area (or an 
area with many bright stars) creates a strong astrometric weight 
imbalance between the two viewing directions, as the other field 
usually points to an area of the sky with much lower source 
density (or fainter stars). Errors in the along-scan attitude cre- 
ate correlated errors in the parameters of all sources observed at 
that time, whether they are in the preceding or following field of 
view. If these source parameters are then used to correct the atti- 
tude, with little counterbalancing effect from the (relatively few) 
sources in the other field, the attitude may get only marginally 
improved, with the net effect of slowing down the damping and 
de-correlation of the errors in the high-density areas. Thus, more 
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Fig. 12. Maps (in equatorial coordinates) of the parallax errors in the three selected iterations k - 5, 20, 135 (top to bottom). The 
left column shows the median of the parallax error zir'** - tjr'™'^, while the right column shows the median of the absolute error 
Izzt'^'^' - tir"™!; each map cell (of about 0.84 deg^) shows the colour-coded value computed for the sources located in that cell. Empty 
cells are shown in black. On every map plot the top left label indicates the iteration number and the top right label is the maximum 
value of the displayed range in yuas. See text for further details. 



iterations are needed compared to a more weight-balanced situ- 
ationF] 



Figure 14 shows the error and update maps of the proper mo- 
tions at iteration 135. Since proper motions are vectors, the dis- 
played quantities are the median lengths of the vectorial differ- 
ences, which are non-negative by definition. The maps are in ex- 
pected agreement with the corresponding parallax ones concern- 
ing visible patterns and structures. A prominent feature absent in 
the case of the parallaxes is the lighter bands of relatively smaller 
proper motion errors around ecliptic latitudes +45° caused by the 
oversampling of these parallels by the scanning law (cf. Fig.[9]l. 
In contrast to the parallax case, these observations do contribute 
to the determination of the proper motion, especially for the 
component in ecliptic longitude. 

All spatial maps in Figs. 



12 ■ 14 show median values com- 



puted from distributions with stars of all magnitudes. A lot more 
information is contained in the magnitude-resolved versions of 



van Leeuwen 



(2007 



" In the new reduction of the Hipparcos data by 
the weight ratio was artificially damped in order to improve the con 
nectivity between the two fields of view in high-contrast situations. We 
have found that this is not required for Gaia provided that the solution 
is iterated to convergence; see Bombrun et al.li 201 1 



the maps, which are not presented here for brevity. Instead, 
Table [2] shows the RSEs (see footnote [TS]) of the errors and nor- 
malized eiTors (see below) of all the astrometric parameters in 
iteration 135, subdivided according to magnitude. The normal- 
ized error is defined as the error (in the case of right ascen- 
sion, Aa* = Aa cos S) divided by the corresponding formal stan- 
dard uncertainty obtained from the inverse of the source normal 
matrix, i.e., by using the approximation in Eq. (120 1. As dis- 



cussed in Sect. 6.3 this approximation underestimates the true 
standard uncertainties of the astrometric parameters by neglect- 
ing the contribution from the attitude uncertainty, which may be 
particularly important for the bright stars where the photon noise 
is relatively less important. 

The RSE of the errors in Table |2] are in reasonable agree- 
ment with recent mission accuracy assessments. For example, 
compared with Eqs. (5.2)-(5.5) in |Lindegren| pO 1 ) the present 
values are a few per cent larger for G < 15, and up to some 20% 
smaller for the fainter stars. This suggests a more conservative 
photon-statistical error budget in Lindegren] ( |2010| l, combined 
with a larger-than-nominal contribution from the attitude uncer- 
tainty in the present solution. The latter effect, further discussed 
in Sect. |7.2.3| is indeed to be expected in the present demonstra- 
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Fig. 13. Same as Fig. 
(right column). 



12 



but showing the updates in parallax, i.e., the median values of w^*' - vj^'' 



(left column) and \tcr'-'''> - tu**- 




Fig. 14. Maps of the error (left) and update (right) in proper motion for iteration 135. Each cell shows the colour-coded median 
error/update in units of //as yr \ where the individual error/update is computed in terms of the equatorial components as (A/i^^ + 



tion solution using far fewer primary stars than planned for the 
real mission. 

A more stringent test of the quality of the solution is obtained 
by considering the RSE of the normalized errors (i.e., after di- 
vision by the formal standard uncertainties as described above), 
which is here denoted g. Ideally we should have g ^ 1 for any 
parameter and any magnitude. As shown in Table |2] this is very 
nearly the case in all parameters for G > 15, meaning that the 
actual errors are roughly consistent with the standard uncertain- 



ties computed from Eq. ( 120 1. For the brighter stars g becomes 



progressively larger, supporting the interpretation that the atti- 
tude uncertainty has a significant impact on the accuracy of the 
bright stars in this solution. For example, the value g = 1 .369 ob- 
tained for the parallaxes of stars brighter than G = 13 suggests 
a quadratic attitude contribution to the parallax errors for these 
stars of [7.5^-(7.5/1.369)^]'^^ = 5.1 //as, while a similar compu- 
tation for the next three magnitude bins gives 5.7, 5. 1 and 5.4 //as 
(with a rapidly increasing uncertainty). Thus it appears that the 
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Table 2. RSE of the errors in the astrometric parameters and of the normalized errors (i.e., after division by the formal standard 



deviations) for different magnitude ranges. The asterisk in a* indicates that the errors are true arcs on the sky; cf. footnote 1 1 See 
text for further explanations. 

RSE of error [/jas and //as yr"'] g = RSE of normaUzed error [unitless] 



Magnitude range 


No. stars 


a* 


S 


m 




1^5 


a* 


6 


■m 






G < 13 


18 253 


6.6 


5.7 


7.5 


4.5 


4.0 


1.507 


1.470 


1.369 


1.443 


1.425 


13 < G < 15 


70 355 


12.4 


10.6 


14.9 


8.7 


7.5 


1.112 


1.100 


1.082 


1.098 


1.100 


15<G< 16 


88 116 


20.2 


17.3 


24.9 


14.3 


12.3 


1.024 


1.024 


1.022 


1.032 


1.024 


16<G< 17 


151639 


30.8 


26.7 


38.4 


21.8 


19.0 


1.010 


1.010 


1.010 


1.014 


1.008 


17 < G < 18 


272424 


49.4 


42.8 


61.8 


34.8 


30.4 


1.004 


1.006 


1.003 


1.002 


1.003 


18 < G < 19 


489253 


83.3 


70.7 


104.1 


58.9 


50.8 


1.001 


1.001 


1.002 


1.003 


1.002 


19 < G 


1 166182 


167.9 


140.0 


207.6 


118.5 


100.2 


1.001 


1.000 


1.000 


1.001 


1.000 


allG 


2256222 


116.8 


98.7 


145.6 


82.4 


70.6 


1.009 


1.009 


1.007 


1.009 


1.008 



values Q > \ found for the parallaxes of the brighter stars can be 
accounted for by assuming a constant contribution, by about 5- 
6 yuas RMS, to the parallax errors from attitude and/or calibration 
errors. For the other astrometric parameters we similarly find a 
constant RMS contribution to the positional errors of about 4- 
5 //as, and to the proper motion errors of about 3^ //as yr"'. It 
will be shown below that these numbers are consistent with the 
actual attitude errors found in the solution. 

It should be noted that the reference epoch for the astromet- 
ric parameters was set to exactly half-way into the mission, i.e., 
fe - 2014.5 for the simulated mission interval 2012.0-2017.0. 
This is optimal in the sense that the positional uncertainties are 
minimized for approximately this epoch, and that the errors in 
position and proper motion are nearly uncorrelated. A reference 
epoch half-way through the mission was also assumed for the ac- 
curacy assessment in Lindegren ( 2010[ l, with which the present 
results have been compared. 

In this solution, the median value of the parallax errors of 
all the 2.2 million sources was not significantly different from 
zero (the actual value was -1-0.004 //as). This shows that, in the 
absence of systematic observational errors, the astrometric solu- 
tion is able to determine the absolute parallaxes of the sources, 
as could be expected from the basic principles of the mission. It 
is especially worth noting that this was achieved while simulta- 
neously determining the PPN y parameter, known to be strongly 
correlated with the parallaxes (cf. Sect. 5.4 1. 



7.2.3. Attitude results 



The bottom diagram in Fig. 1 1 shows the RSE attitude errors and 



updates as a functions of the iteration number, where the small 
differences of the attitude quaternions have been transformed 
into small rotations along the SRS axes (x, y, z) according to 
Sect. A. 6 and expressed in //as. The error component around the 
z axis corresponds to the AL attitude error, while the x and y 
components are linear combinations of the AC attitude errors in 
the PFoV and FFoV|f]The z (AL) errors settle at an overall level 
of 20 //as around iteration 60, while the updates continue to de- 
crease in a similar manner as for the parallaxes (Fig. [TT[ top). 
The RSE values of the x and y errors converge to 167 //as and 
224 //as, i.e., an order of magnitude larger than in z, reflecting 
the larger observational errors in the AC direction (Table [TJ and 



The attitude errors discussed here must not be confused with the 
excess attitude noise 6,, introduced in Sect. |3.6| The latter represents 
modelling errors, which are practically absent in the demonstration so- 
lution. 



the smaller number of AC observations. The ratio of the errors 
about y and x, 224/167 ^ 1.34 is in perfect agreement with the 
value expected from the geometry of the observations (Fig. |2]l, 
viz., tan(rc/2) ^ 1 .34 for a basic angle of - 106.5°. 

The converged AL attitude eiTor of 20 //as is completely con- 
sistent with the previously infeiTed constant RMS contribution, 
by 5-6 //as, to the parallax errors (see Sect. 7.2.2 1, as can be seen 
from the following considerations. For most stars, the propaga- 
tion of random observational errors from individual AL observa- 
tions to the parallaxes (say) is largely governed by geometrical 
factors and the total number of observations per star, and can be 
statistically described by a 'coefficient of improvement' which 
can be estimated to 207.6/2300 ^ 0.09 using the RSE of the 
parallax errors for the faintest bin in Table |2] combined with the 
typical AL observational error at G ^ 19.6 (cf. Table [T]i. This 
factor assumes that the individual observational errors (at each 
CCD) are uncorrelated, which is a very good approximation for 
the photon-statistical centroiding eiTors, but not for the attitude 
errors, which have a correlation length determined by the knot 
interval of the attitude spline. In the demonstration solution, the 
knot interval was 240 s, which is much longer than the time it 
takes an image to cross the nine CCDs in the astrometric field 
(^ 40 s). Therefore it is a much better approximation to assume 
a constant attitude error for the whole field crossing, correspond- 
ing to nine AL observations. As a result, the coefficient of im- 
provement relevant for the attitude error should be a factor three 
larger, or ^ 0.27. The AL attitude uncertainty of 20 //as therefore 
coiTesponds to 20 x 0.27 ^5.4 //as in the parallax, in very good 
agreement with the empirical result of 5-6 //as. For the other as- 
trometric parameters a corresponding calculation yields a contri- 
bution of about 4 //as in position and 3 //as yr"' in proper motion. 
Although these numbers are somewhat smaller than the empir- 
ical estimates in Sect. 7.2.2 (possibly indicating an additional 



contribution from the calibration errors), the overall agreement 
is striking. 

The attitude errors obtained in the solution ultimately come 
from the observational errors of the individual observations, 
through the process of fitting the attitude spline functions to 
these observations. If more observations (of the same quality) 
are added, the attitude errors are expected to diminish inversely 
with the square root of the number of observations (as long as 
the modelling eiTors are not a limiting factor). The present AL 
attitude error of 20 //as is roughly what can be expected from the 
density and magnitudes of primary sources in the demonstration 
solution, as can be seen from the following simple calculation. 
The AL attitude has essentially one degree of freedom per knot 
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interval (240 s). The average number of AL observations per 
degree of freedom is therefore about 2500 (Table [T]). From the 
magnitude distribution in Table |2] and the AL observational un- 
certainties in Table[T]one can estimate that the average AL obser- 
vation carries a statistical weight {cr'^) corresponding to an AL 
standard uncertainty of ctal - 650 /ias (taking into account the 
weight reduction by g^^ for the bright stars). The mean flow of 
observations therefore should allow the AL attitude to be pinned 
down with an uncertainty of about 650 x 2500"'^^ - 13 //as. 
However, this rough calculation assumes uniform distribution of 
the stars across the sky, with the same mean density (55 deg"^) 
as in the demonstration run. Considering that large parts of the 
sky have a much lower density (typically 5-10 deg"^; see Fig.|9] 
top), which implies a less precise attitude at the corresponding 
times, and that we have also neglected the attitude modelling er- 
rors, which here amount to at most 9 //as RMS (Sect. 7.2.1 1, it is 
not unreasonable that the overall AL attitude uncertainty is about 
50% larger than according to the above calculation. 

The demonstration run uses only 2% of the stars envis- 
aged for the final AGIS solution, and the majority of them are 
faint, whereas the real primary stars are preferentially selected 
amon g the brighter stars when possible (cf. the discussion in 
Sect. 6.2.2 1. For a model distribution of 10^ primary sources 



similar to the one described by Hobbs et al. (20101, the AL 
observational uncertainty corresponding to the average statisti- 
cal weight is more like 200 //as, rather than the 650 //as in the 
present data. On the other hand, the attitude knot interval will 
also be much shorter than the 240 s used in the present run, per- 
haps even as short as 5 s, which is about the shortest knot in- 
terval that can reasonably be used in view of the normal CCD 
integration time of 4.42 s. Combining these numbers we esti- 
mate that the final AGIS run on the real Gaia data might obtain 
an AL attitude uncertainty, due to the photon noise, of about 
(20 //as) X (200/650) x [0.02 x (240/15)]'/- ^ 6 //as. To this 
should be added the attitude modelling error (i.e., how well the 
spline can represent the effective attitude), which is difficult to 
estimate without more reliable information about attitude irreg- 
ularities ( Appendix |D.4| l. Generally speaking, the optimum knot 
interval will roughly balance the estimation and modelling un- 
certainties, so that the total uncertainty is less than twice the esti- 
mation uncertainty. Assuming a total AL attitude error of 12 //as 
RMS, this would give less than 4 //as RMS to be added quadrat- 
ically to the parallax uncertainties. Thus, the attitude contribu- 
tion to the final astrometric parameters appears to be relatively 
small even for the bright stars. However, this does not take into 
account the additional complications caused by the gated obser- 
vations (Appendix |D.3| l and residual calibration errors due to, 
for example, CTl effects (Appendix D.2 1. 



7.2.4. Calibration results 

The calibration model used for the run merely contained one ef- 
fect (A^AL = 1 in Eq. 20 1, viz., the large-scale calibration At/ in 
Eq. ( 15 I accounting for geometric distortions of the CCDs and 
optical eff'ects which are indistinguishable from geometric irreg- 
ularities of the focal plane. On this account, we expect the simu- 
lated basic angle signal (see beginning of Sect. 7.2 1 to manifest 
itself through a corresponding time-dependence of the AL large- 



scale calibration parameters ArjQfnQj. The asterisks in Fig. 15 
show, for / = PFoV and for each time interval j, the calibration 
parameter values of the demonstration solution averaged over 
the 62 CCDs (n). The solid line depicts the step-sinusoidal input 
basic-angle signal applicable for PFoV. As anticipated, the esti- 
mated calibration parameters are in very good agreement with 




Fig. 15. Variation of the AL large-scale calibration parameters 
(averaged over all CCDs) in the preceding field of view (PFoV), 
as a function of the time since the beginning of the mission. The 
step-sinusoidal curve is the expected variation due to the simu- 
lated basic-angle variation having a period of 2.5 yr and an am- 
plitude of 500 //as, but constant within each 30 day interval. The 
asterisks show the results of the solution (one value per 30 day 
interval), and the circles show the differences on the magnified 
scale to the right. Thanks to the constraint in Eq. ( 16 1, the mean 
calibration parameters in the following field of view (FFoV) ex- 
actly mirror the displayed ones, and are therefore not shown. 



the input signal: the RMS value of the differences is 0.20 //as, 
corresponding to an RMS error of 0.40 //as for the basic an- 
gle offsets AFj per calibration time interval (cf. Eq. 19 1. This 
is reasonably consistent with the expected precision of me large- 
scale calibration based on the total weight of the observations, 
as shown by the following calculation. The mean number of AL 
observations per calibration time interval and field of vie w is 

that 



1.33 X 10^ (cf. Table[T}. Assuming, as we did in Sect. |7.2.3 



an AL observation of average weight corresponds to a standard 
uncertainty of ctal - 650 //as, the expected precision of the ba- 
sic angle determination is 2'^^ x (650 //as) x [1.33 x 10^]"'^^ - 
0.25 //as. The observed scatter, 0.40 //as, is larger by roughly the 
same factor as found for the AL attitude errors. 

The good agreement between the input calibration signal and 
the recovered parameters demonstrates the correct functioning 
of the generic calibration model (see Sect. 3.4 1 in this simple 
case. We expect that many more validation runs will be needed 
to confirm this result in more complex circumstances, i.e., with 
more calibration effects of different functional compositions and 
dependencies. A further important aspect is the practical study 
of possible hidden correlations and degeneracies of calibration 
with source, attitude, and global parameters which may not be 
fully obvious at the mathematical level. 

7.2.5. Global results 

Starting the iterations from a PPN y value of 1.1, the purpose 
of running with the Global block was to see how such a grossly 
wrong initial value would affect the (overall) convergence rate 
and to what level the correct value could be recovered (recall 
that the input data were simulated using y - \). 

Figure 16 presents the evolution of =7-1 during the 
run. In iteration 135 the parameter had settled on the value gQ - 
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from a previous solution, already very close to the final one; 
a smaller number of iterations might therefore suffice. In sum- 
mary, the estimated processing time is clearly within the feasible 
range. 



-5x10" 
-1x10 



60 80 
Iteration number 



Fig. 16. Evolution of the estimated global parameter go = y - I 
(where y is the PPN parameter) as a function of the iteration 
number, go settles at a level of 2.6 X 10"^ from a starting value of 
go - 0.1. The formal uncertainty of in this solution is 2.15 x 
10"^. Note that a linear scale is used for \go\ < 10^* (the grey 
area), while a logarithmic scale is used outside of this interval. 



2.62 X 10"^, with a formal standard uncertaint}j^of 2.15 x 10"^. 
At +1.2 standard deviations, this value of is not significantly 
different from 0. The run shows that the system is capable of 
recovering the 'correct' value with an error compatible with the 
statistical uncertainty, which in this case is set by the photon 
noise of the individual observations. 

The total astrometric weight of the AL observation s in the 
demonstration run is ^ 4000 juas"^. According to ,Hobbs et al. 
( 2010| l this should yield an RMS uncertainty in y of about 
3 X 10"^, in reasonable agreement with the formal standard un- 
certainty given above and the parameter value obtained in the 
solution. 



7.3. Processing times 

The demonstration solution was run on an IBM cluster at ESAC 
using 14 nodes (out of 32 available), each node having two pro- 
cessorj^with four cores each; thus in total 1 12 CPUs were en- 
gaged. This configuration of 14 nodes is estimated to have a total 
floating point performance of 0.65 Tflop s"' (0.65 x 10'^ floating 
point operations per second). One iteration took about 1 hr (with 
high CPU occupancy, typically 90%), so the total run time for 
135 iterations was nearly 6 days, corresponding to about 3 x 10'^ 
floating point operations (flop). 

Scaled up to the projected 10** primary sources of the final 
AGIS run this would amount to 1.5 x 10'^ flop. Using a more 
conservative estimate of 5 x 10'"^ flop to account for additional 
features not included in the demonstration run, this will require 
some 60 days on a typically targeted 10 Tflop s ' machine. On 
the other hand, there could also be a significant saving in com- 
puting time due to the fact that the final solution will start off" 



8. Conclusions 

A fundamental part of the scientific data processing for the Gaia 
mission is the astrometric core solution, which will be run during 
and after the mission (ca. 2013-2020) with successively larger 
datasets and eventually encompassing at least some 100 million 
primary sources. This solution is central to the performance of 
the mission as a whole, since it not only provides the astrometric 
results for these primary sources, but also the reference frame 
(in the form of the instrument attitude as a function of time) and 
the geometric calibration of the instrument, for use by a large 
number of other processes in the overall scientific reduction of 
the Gaia data. 

In order to accomplish the astrometric core solution, a 
software system known as AGIS (Astrometric Global Iterative 
Solution) is being built within Coordination Unit 3 (CU3) of 
the Gaia Data Processing and Analysis Consortium (DPAC). As 
detailed in this paper, the necessary mathematical models and 
numerical algorithms are well understood, and have been de- 
veloped with sufficient rigour to allow the potential accuracy of 
Gaia to be fully exploited. Most critical parts of this system have 
been implemented, and numerous test runs have demonstrated 
the theoretical validity of the global iterative approach, as well 
as its practical feasibility in terms of data management and com- 
putations. While a number of additional features will have to be 
included in the software before it can be considered ready for 
the flight data, and many more complications will undoubtedly 
be discovered during the actual analysis of these data, all the 
fundamental parts of AGIS are already in place. 

In 2011, with roughly two years left until the launch of the 
satellite and time to mature the concepts and software presented 
here into a robust operational system, we have no reason to doubt 
that AGIS will be able to compute an accurate astrometric solu- 
tion, consistent with the ambitious goals of the Gaia mission. 
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Appendix A: Quaternions 

Quaternions form a non-commutative algebra in R"*. Invented by 
W. R. Hamilton in 1843 as an extension of the complex num- 
bers (Hamilton 18431, their most common usage today is for 
representing spatial rotations in a particularly compact and con- 
venient way, with applications for example in computer graphics 
and spacecraft attitude control. Quaternions can be introduced 
and understood in many different ways, with a correspondingly 
confusing multitude of notations and conventions. Here we give 
just a brief introduction to the subject, stating only the minimum 
results needed for our applications. For more details, see for ex- 
ample ^Wertz^fJ^TS^) and Kane et al.| ( |1983| l. 



A.1. Quaternion algebra 

A quaternion is a quadruple of real numbers for which the fol- 
lowing algebraic operations are defined. For any quaternions 
a = {ox, fly, a,, fl,v) and b = {b^, by, b~, b^} we have addition 



a + b = jflj. -I- by, fly + by, fl, -I- b,, a„ + 



(A.l) 
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multiplication 



ab = I a^bw + Oyb^ - a^by + a„bx, 

-Oxby + Oybw + cizbx + a„by, 
a^by - Oybx + a~b„ + a^b^. 



(A.2) 



-Ojcbx - Qyby - a^b^ + a„b„ \ 



and scalar multiplication 



rsflv, sa,., sa,, sa^. 



(A.3) 



for scalar s. Subtraction is analogous to addition, as derived from 
a - b = a + (-l)b. The conjugate of a is 



■• = (- 



the norm (length) is 



||a|| = (flj + a; + fl^: + ai) 
and the inverse (provided ||a|| > 0) is 



2x1/2 



a-' = llair^a* 



We have 



(ab)* = b*a* , (ab) ' = b 'a 



(A.4) 



(A.5) 



(A.6) 



(A.7) 



Any non-zero quaternion can be normalized to unit length. In 
analogy with the notation for vector normalization, we use an- 
gular brackets for this operation: 



<a) = ||a|r'a. 



(A.8) 



The triplet of real numbers {a^, a,,, '^z) can be regarded as the 
coordinates of the vector a in some reference system S = [jr 3? z] 
(where x, j, z is a right-handed set of orthogonal unit vectors). 
Thus, we can write a = {S'a, a^^,], where S'a = [a^ a,, a,]' 
constitutes the so-called vector part of the quaternion, and 
the scalar part. Both scalars and vectors (in M?) can thus be 
seen as special cases of quaternions, namely, when the vector 
or the scalar part is zero. This allows us to write for example 
||a|p = aa* - a*a. In terms of the usual vector-scalar operations 
the quaternion multiplication can also be written as 

ab = jS' (a X ft -H ab^ + ba„) , a^b^, - a'ft} . (A.9) 

Note that the vector part of a quaternion only makes sense when 
expressed in some coordinate system (S in this example); a phys- 
ical vector cannot be part of a quaternion. 

A.2. Spatial rotations 

Unit quaternions (of unit length) are convenient for repre- 
senting orientations and spins of objects in three-dimensional 
space. This is compacter, numerically more stable and requires 
fewer arithmetic operations than the use of rotation matrices. 
Compared with the use of Euler angles, much fewer or no 
trigonometric functions need to be evaluated, and singularities 
are avoided. 

According to Euler's rotation theorem any change in the ori- 
entation of an object can be described as a rotation by a certain 
angle around some fixed axis. Let this axis be represented by 
the unit vector u and the rotation angle by (p, reckoned in the 
positive sense around the vector In the given reference system 



S = [a: J z], the rotation is then represented by the unit quater- 
nion 



|s'm 



sm— , cos — 
2 2 



(A. 10) 



The useful property here is that a sequence of rotations is repre- 
sented by the product of the corresponding unit quaternions (see 
Sect.[A31l. 

From Eq. ( |A.6| l it follows that the inverse of a unit quaternion 
equals its conjugate, so 



-S'm sin -, cos 



(A. 11) 



which represents a rotation by -(p around u. 

A rotation by the angle 27r around any axis is represented 
by the unit quaternion {0, 0, 0, -1 ). Since this operation is phys- 
ically equivalent to no rotation at all, it implies a sign ambiguity 
in the quaternion representation of any given rotation (modulo 
2;7r). This is potentially a problem only when the quaternion is 
used to represent a continuously changing orientation as a func- 
tion of time, as is the case for the Gaia attitude (Sect. |3.3[ ). It 
is then necessary to ensure that no sign flips occur, e.g., when 
converting from some other representation of the orientations. 



Equation (A. 10 1 suggests an alternative, non-redundant, way 
of representing spatial rotation, namely by means of the compo- 
nents of the vector = U(p. For any continuous rotation the three 
components could be continuous functions of time. This formal- 
ism is mainly useful for small rotations (||0|| <K 1); when applied 
for example to a spinning satellite the length of the vector may 
grow indefinitely, causing unacceptable numerical errors in fi- 
nite arithmetic. For the arbitrary vector <f> we nevertheless intro- 
duce the special notation Q(S'^) for the unit quaternion, in the 
reference system S, representing a spatial rotation by the angle 
<p = 1 1^1 1 about an axis parallel to <f>: 



Q(S'4>) 



{S'{<f>} 



{0,0,0,1) 



sm — , cos —} if > 0, 

2 21 ^ (A.12) 



ifS = 0. 



This notation is here only used when discussing the small rota- 
tional offset between the ICRS and AGIS frames in Sect. 16.1.11 



A.3. Vector and frame rotations 

Vector rotation and frame rotation are not standard notions in 
vector or quaternion calculus, but we have found them helpful in 
order to clarify the relations between vectors and their represen- 
tations in different reference systems. 

Vector rotation. Let {S'ro, 0} be the quaternion representation 
in the reference system S of the arbitrary vector ro. Rotating the 
vector an angle (p around unit vector u results in a new vector 
ri, whose coordinates in S can be calculated by two successive 
quaternion multiplications. 



{S'ri,0} = q{S'ro, Ojq" 



(A.13) 



where q is given by Eq. ( A.IO i and q - q*. We call the opera- 



tion in Eq. ( A.13 i for the vector rotation of ro by the quaternion 
q. This calculation requires the use of a particular reference sys- 
tem (S in this example) in which to express the vectors and the 
quaternion; the resulting physical vector r\ is however indepen- 
dent of the reference system. 
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Applying n successive vector rotations, specified by the 
quaternions qi, q2, . . . , q„, gives the vector r„ in 

{S'r„ , 0} = q„ ■ ■ ■ qzqi {SVo , O} q^'qj ' ■ ■ ■ q^' . (A. 14) 

This is equivalent to a single vector rotation by q = q„ ■ ■ ■ qaqi- 
All the quaternions are here expressed in the fixed reference sys- 
tem S, and the order of multiplication is from right to left. 



Frame rotation. A more common application in astrometry is 
where the reference system itself is rotated, say from So - 
[xq yo zo] to Si - [x\ y\ zi], by the quaternion q. Given the 
coordinates S^r of the arbitrary vector r in reference system So, 
we want to compute the coordinates S'^r of the same physical 
vector in the rotated reference system. By applying a vector ro- 
tation to each of the basis vectors we find 



S'lr, 0} = q-'{S[,r, ojq, 



(A.15) 



which we refer to as the/rame rotation of r by the quaternion q. 

It is important to note that the numerical representation of the 
quaternion q, representing a frame rotation from So to Si, is the 
same in the two frames. This follows because the components of 
the vector u are invariant under a frame rotation about the vector 
itself, i.e., S^m = S,')M. In Eq. (A. 10 1 the vector part of q can 



therefore be expressed in either of the two reference systems, 
i.e., we may take S = So or S = Si. The scalar part cos(0/2) is 
of course independent of the reference system. 

Successive frame rotations can therefore be accomplished by 
referring each rotation to the current set of axes, which is usually 
precisely what is needed. Let qi, q2, • . . , q,, be a sequence of 
frame rotations, from So to Si, and then from Si to S2, etc.; here 
qi is expressed in So (or Si), ({2 in Si (or S2), and so on. The 
corresponding transformation of the coordinates of the vector r 
is given by 

[S'„r, 0} = q,;i ■ ■ ■ q^'qr' [%r, o} qiq2 ■ ■ ■ q„ , (A.16) 

equivalent to a single frame rotation by q = qiqi ■ ■ 'qn- The 
quaternions are here expressed in the concurrent reference sys- 
tem, and the order of multiplication is from left to right. 



Algorithm A.l For given A - [A„ A.^; Ay^ ■ ■ ■ ], this al- 
gorithm returns a unit quaternion q = [q^, qy, q~, q^] such that 
Eq. ( A. 19 1 is satisfied. 



1 : .? <— A ,- V + + A-- 

2: for ; = x, y, z do 

3: |?,|«-[(l-,v)/4 + A„/2]'/2 

4: end for 

5: ^[(l + .v)/4]'/2 

6: if l^;,! > max(|gy|, \q^\) then 
7: i <— X, j <^ y, k (— z 

8: else if \qy\ > max(|^j.|, |^.|) then 
9: i <^ y, j <^ z, k <^ X 

10: else 

11: I «— z, j <— X, k <— y 

12: end if 

13: qi ^ \qi\sign[Aj,,-Atj] 

14: qj ^ \qj\ sign[9,(A,j + Aji)] 

15: qi, <^ \qk\ sign[^,(Art. + A^,)] 



A.5. The attitude matrix 

For completeness, we give here the full relations between the 
attitude matrix defined in Sect. |3.3| and the quaternion represen- 
tation of the attitude. Let C = [Z F Z] be the celestial reference 
system (CoMRS; Sect. |3.1| l and S = [a: j z] the instrument sys- 
tem (SRS). The attitude matrix A is defined by Eq. ([8]l. The at- 
titude quaternion q - {q^, qy, q^, qu,} gives the rotation from the 
CoMRS to the SRS, i.e., {C'jc,0) = q{C'Z,0)q etc. Then 



(A. 19) 



1 - 2(^5 + ql) 2{q^qy + q,q„) liq^q, - qy.qj 
2(qxqy - qzqw) 1 - 2(ql + qj) 2(qyq, + q_yq„) 
2(qj,q, + qyq„) 2{qyq, - q^q,,) 1 - 2{ql + qj) 

The conversion from A to q is less straightforward if we 
want to avoid numerical problems for certain attitudes. A sta- 
ble algorithm was given by Klumpp] ( 1976j ). In our nota- 
tion, using pseudo-code, it is given by Algorithm |A.1| Note 
that {qy, qy, q., q„} and {-q_y, -qy, -q^, -qw} correspond to the 
same A, while the algorithm always returns a quaternion with 
q„ > 0; a reversal of the signs may therefore sometimes be re- 
quired to ensure the temporal continuity of the quaternion com- 
ponents. 



A.4. Angular velocity 

Let q be a unit quaternion, expressed in the non-rotating refer- 
ence system C, and let us assume that q is a differentiable func- 
tion of time. The angular velocity £2 associated with the time- 
dependent vector rotation by q is the same for all vectors, and 
given by 

{C'£2, OU 2qq-' , (A. 17) 



where the dot signifies the time derivative. This result can be de- 
rived by taking the time derivative of the vector rotation formula 
for arbitrary vector r, using r = £lx r and Eq. (A. 9 1. 



Let S be the reference system obtained after rotation by q. 
The coordinates of the angular velocity vector in the new system 
are found by performing a frame rotation of Eq. ( A.17 1 by q; 
thus 

{S'n, 0} = q^'jC'n, ojq = 2q"'q . (A.18) 

These expressions show, for example, how the instantaneous an- 
gular velocity of Gaia can be calculated either in the celestial 
reference system (CoMRS), using Eq. (A.17i, or in the instru- 
ment system (SRS), using Eq. (A.18 1, from a knowledge of the 
attitude q and its time derivative q. 



A.6. Differential rotation 

Up until now, the quaternion formulae given in this Appendix 
hold rigorously for arbitrary rotations. We now derive a useful 
result, which however is only valid to first order in the (small) 
rotation angles. It can be used, for example, to compute the atti- 
tude error angles about the SRS axes, as was done in Sect. |7. 2.3] 
Let qo and qi be given unit quaternions, representing the two 
nearly co-aligned reference systems So = [x^ yo Zo] and Si = 
[xi yi Zi] with respect to some third (common) reference system 
such as the ICRS. It is required to express the difference between 
S 1 and So by means of three small angles 0^, <py, (p, representing 
rotations about the axes in either system. More precisely, let <f> = 
[<pjt (py <f),]' be the spatial rotation that brings So into coincidence 
with Si. Assuming that ||^|| <«c 1 and neglecting terms of order 
ll^lP, we have Si - So -H ^ x So, or 



S'iSo^/-H((/ixSo)' So 



1 



1 



(A.20) 



According to Eq. ([8]) this matrix describes the orientation of Si 
with respect to So. If d is the quaternion representing a frame 
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rotation from So to Si, we have (Sect. [A3] l qi = qod and hence 
d s {d„ dy, d-, £/,,) = q„'qi . (A.21) 



Given that ||^|| is small, dx, dy and d, are also small quantities, 
while Iduil ^ 1. Due to the sign ambiguity of the quaternion 
representation (Sect. A. 2 1, t/„, could however have either sign. 



Comparing Eqs. (A.19 1 and (A. 20 1 we find, to first order in the 
small quantities. 



'2^xdw 



2dyd„ 



2d,du, , 



(A.22) 



where the factor du, (being close to ± 1 ) guarantees that the angles 
obtain their correct signs. Equations (A.21 i-(A.22i provide the 
required transformation from (qo, qO to (0.t, (py 4'z)- 



Appendix B: Splines 

A spline is a piecewise polynomial function Sit) defined on 
some interval [fbeg^fendl- That is, if the interval is divided into 
K > sub-intervals by means of the instants tk (called knots), 
such that fbeg - to < ti < ■ ■ ■ < Ik - fend, then in the sub-interval 
tk < t < tfi+i the spline function S{t) equals some polynomial 
Pkit), k = . . . K - 1. The splines of interest here consist of 
polynomials of some fixed order M (or degree M - 1); typically 
M = 4, corresponding to cubic splines. Moreover, the splines are 
usually maximally smooth, i.e., 5 *'"'(?) = d^S'/df™ is continuous 
for fbeg < f < fend and m = . . . M - 2. 

The K polynomials of order M require KM coefficients 
for their specification. For a maximally smooth spline, there 
are M - 1 continuity conditions for each interior knot, namely 
S^'"\tk-) = S'"'\tk+) for m = . . . M - 2 and = I...K -I; 
thus a total of {K-l){M-l) constraints. The spline consequently 
has KM - (K - 1){M - I) ^ K + M - I degrees of freedom]^ 
In the context of least-squares fitting, this equals the number of 
unknowns (parameters to fit), which we denote by A^. In the fol- 
lowing we take and M to be the characteristic numbers of the 
spline, rather than K and M. For a maximally smooth spline we 
haveK ^N-M+l. 



B.1. B-splines 

There are many different ways in which a spline could be rep- 
resented (parametrized). The approach taken here is to consider 
S(t) as a linear combination of some basis functions B„(t), 



(B.l) 



«=o 



where c„ are coefficients to be determined. Choosing the basis 
functions to have minimal support (see below) for the given or- 
der and smoothness leads to the so-called B-splines ( de Boor| 

|2oori i. 

The B-splines are uniquely defined by the adopted knot se- 
quence. In the following we shall use t„ (rather than t„) to denote 



When the spline is used for interpolation, it is typically chosen to 
go exactly through the K + 1 values S(tk) for k = . . .K. This leaves 
(K + M - I) - (K + I) = M - 2 degrees of freedom. Thus, for a cubic 
spline (M = 4), two more conditions must be imposed for the spline 
to become unique. The most common choice is to make the second 
derivative vanish at fi,e„ and fend - This defines the 'natural' interpolating 
spline. By contrast, when splines are least-squares fitted to data, as in 
the attitude determination problem, there are typically many data points 
per sub-interval, and no need for special endpoint conditions. 




13 14 [5 
Coordinate t 



Fig. B.l. The first six B-splines of order M - 4 (cubic) defined 

on the regular knot sequence tq, ti, For better visibility, 

each B-spline is vertically displaced by one unit, and the non- 
zero parts are drawn in thick lines. 



the knots associated with the B-splines. The reason is that the 
knot sequence for the B-splines has a slightly more general in- 
terpretation than just the simple division of [fbeg, fend] considered 
above. Moreover, when fitting the spline to given data points, 
this allows us to use tk (for example) to denote the time associ- 
ated with the kth data point, without ambiguity. 

The knot sequence must be non-decreasing, tq < ri < T2 < 
■ ■ ■ , and at least two knots must be diff'erent. Very often we use a 
regular knot sequence in which t„+i = t„ + At for some At > 
(the knot interval). Figure B.l shows the first six cubic B-splines 
defined on a regular knot sequence. Note that the non-zero part 
of each B-spline, shown in heavy line, stretches over M consec- 
utive knot intervals. We use the convention that the B-splines are 
labelled with the same index as the left-most knot of its non-zero 
part; therefore, the support of B„(t) is (t„, t„+m)- 

It is not possible to construct a maximally smooth spline 
function of order M that is non-zero over less than M consec- 
utive knot intervals. This is what we mean by the statement that 
B-splines have minimal support: they could not be shortened 
without loosing some smoothness. This is an important property 
for the least-squares fitting, as shown by the following consider- 
ation. Least-squares fitting of ( |B.l| i to a given set of data points 
(tk, Zk) (where tk e [fbeg, fend] for each k) will be done by forming 
normal equations. The normal equations matrix is a symmet- 
ric, positive definite matrix of dimensiorj^A^ x A^. Since A^ may 
be very large, it is desirable that the matrix is sparse, i.e., that 
most elements are zero. It is easy to see that element N/j will be 
non-zero as soon as Bi{tk)Bj{tk) + for some data point k. To 
make the matrix maximally sparse, we should therefore choose 
basis functions with minimal support. B-splines have minimal 
support and are therefore ideal for least-squares fitting using 
sparse matrix algebra. 

Since the support of a B-spline of order M extends over at 
most M consecutive knot intervals, we have Nij - for |i - 



In the attitude determination problem, each of the four components 
of the quaternion is represented by a spline, so the number of param- 
eters is actually 4A' and the normal matrix is of dimension 4^ x 4A'. 
Alternatively, we may take the elements of A' to be sub-matrices of 
dimension 4x4. The indices z and j used below refer to these sub- 
matrices. 
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Algorithm B.l For given spline order M, knot sequence {t„), 
time t, and left index £ (such that tc < t < tu\), this algorithm 
returns {bo . . . bM-\] such that - B(-M+i(t), ■ ■ ■ , ^m-i = B({t). 



"spline interval" 
(only obsen/ations in this inten/al are used, 
and the fitted spline is only valid here) 



for ;■ 



1 



2 do 



= to M - 

Lj^t- Tc-i 

for 7 = to ; do 

U<r-bjl{Rj+Li_j) 

bj s + RjXu 

s <— L,_j X u 
end for 
bi+i «- J 
end for 



j\ > M - I. This shows that is a symmetric banded matrix 
with (half-) bandwidth equal to M - 1. Cholesky decomposition 
preserves the band structure of the matrix and is therefore ideal 
for solving the normals; it is also numerically very efficient. 

B.2. Calculation of B-splines 

At any given point t there are at most M non-zero B-splines, 
namely Bc-M+i{f), Bi^m+i, ■ ■ ■ B((t), where ^ is the 'left index' of 
t, such that Tf < t < Tf+i. They can be computed simultaneously 
in a numerically stable way by means of de Boor's algorithm 
( de Boor|2001 1, which is given as pseudo-code in Algorithm B. 1 



If needed, the derivatives of the B-splines with respect to t can 
be computed simultaneously with little additional effort. 

By inspection of the algorithm it is found that the knots used 
for computing the B-spline values in the interval [tc, T(+\) are 
Tf-M+2 through Tf+M-\- For example, with reference to Fig. |B.l| 
(with M - 4), the B-splines between T3 and T4 (i.e., for left 
index £ - 3) depend on {ti, T2, ■ ■ ■ , re), but not on tq or T7, 
even though Bo{t) in general depends on ro and B^it) depends 
on T7. 

B.3. Use of multiple knots 



Algorithm |B . 1 1 works also in the case when several consecutive 
knots are placed at the same t coordinate. Having a knot of mul- 
tiplicity m (i.e., m knots at the same t) removes the continuity 
for derivatives of order M - m and higher The normal situa- 
tion is that the knots have multiplicity 1, which means that the 
spUne is continuous at the knots up to and including its (M-2)th 
derivative, but discontinuous in its (M - l)th derivative. By in- 
serting multiple knots at some specific instant, one allows the 
spline to become less smooth at this point. For example, in a cu- 
bic spline (M = 4) a triple knot (m = 3) allows the first derivative 
to become discontinuous at that point, while leaving the spline 
function itself continuous. In the Gaia attitude processing, mul- 
tiple knots will be used for modelling the effects of microme- 
teoroid impacts, which cause (almost) instantaneous changes in 
the angular velocity, corresponding to discontinuities in the first 
derivative of the attitude spline. 

Multiple knots may also be used at the endpoints of the 
spline interval [fbeg, fend]- At any point in this interval, there must 
be exactly M non-zero B-splines in order that a linear combina- 
tion of them should to be able to represent an arbitrary spline of 
order M. Again, with reference to Fig. B.l (for M = 4), we see 
that this is the case to the right of 73 (or tm-i in general). Thus 
we should put tm-i - fbeg- The first M - I knots can in princi- 




first B-spline 

e„('). 



last B-spline 

e«-,(') 



these M-1 anterior knots 
can be placed anywhere 



N-M interior knots 
(including any multiplicity) 



these M-^ posterior knots 
can be placed anywhere 



Fig. B.2. Illustration of the knot placement for a spline of order 
M (e.g., M = 4 for a cubic spline) with degrees of freedom, 
[fbeg, fend] IS the spHnc interval over which the spline is fitted to 
given data points. The end knots tm-i and tn are at the endpoints 
of the spline interval. The N-M interior knots are chosen to give 
the spline the desired flexibility, including multiple knots where 
required. The placement of the anterior (r < fbeg) and posterior 
knots (t > fend) is in principle arbitrary: it does not change the 
fitted spline in [fbeg, fend], but may affect the condition number of 
the least-squares fit. The parameters of the fitted spline are the 
coefficients cq, ... c^^i of the B-spUnes Bo{t) through BN-i(t). 



pie be placed anywhere, as long as tq < ri < - - - < tm-i- any 
such arrangement will produce M non-zero B-splines in the next 
sub-interval {tm-i,tm), and although the B-splines are different 
depending on the arrangement of the knots, they are always lin- 
early independent and therefore can be used as a basis for the 
spline. In particular, it is possible to put the first M knots at the 
same point, i.e., tq - ti - ■ ■ ■ = tm-i = fbeg- 

Corresponding considerations apply to the right limit of the 
spline interval: with degrees of freedom, the support of the 
last B-spline B}^_i(t) extends from t^_i to Tf^+M-i- The spline 
interval must end at = fend, and the remaining M-1 knots can 
be placed anywhere provided that < t^+i < - - ■ < t^+m-i- In 
particular, it is possible to have an M-fold knot at the endpoint 
of the spline interval, i.e., fend = '''n - ^w+i = ■■ ■ = t^+m-i- 
Figure |B.2| summarizes the placement of knots in relation to a 
given spline interval for given order M and degrees of freedom 
(number of spline coefficients). 

Although the placement of the first and last M-1 knots is ar- 
bitrary, and does not change the resulting spline between fbeg and 
fend, their placement does affect the numerical stability of the re- 
sulting least-squares system. Butkevich & Klioner (unpublished 
technical note) has pointed out that collapsing the anterior and 
posterior knots into the end knots, so that tq - ti = ■ ■ ■ - Tm-\ = 
fbeg and fend - tn - tn+1 - ■ ■■ = tn+m-\) results in a system 
with much smaller condition number than a regular sequence 
extending beyond the endpoints. For a cubic spline (M = 4) 
the spline interval then begins and ends with 4-fold knots as il- 
lustrated in Fig. |B.3| Ho wever, the use of data segmentation, as 
described in Sect. 5.2.2 may not permit this device. 



Appendix C: A robust Cholesky algorithm for 
positive semidef inite matrices without pivoting 

C.1. The use of normal equations 

The least-squares problems considered in this paper are solved 
by the method of normal equations (here denoted Nx = b), us- 
ing the Cholesky algorithm to decompose the symmetric normal 
matrix A^. This method is known to be computationally efficient 
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A/-4 interior l<nots 



Fig. B.3. Definition of a regular knot sequence for fitting a cubic 
spline (order M = 4) in the interval [fbeg, fend] - The spline interval 
is divided into N - 3 knot intervals of equal length At = (fend - 

tbes)KN - 3). 



and accurate for well-conditioned problems (e.g., Stewart] 1998[ l, 
and is moreover well adapted to in-place manipulation of sparse 
matrices such as the band matrix obtained when fitting B-splines 
(Fig.ig 

The method of normal equations for the general least-squares 
problem is usually discouraged in the literature, due to the much 
superior stability and accuracy of alternative methods operating 
directly on the observation equations Ax ^ h (where = A' A 
and b = A'h), e.g., using Householder orthogonal transforma- 
tions (e.g., |Bjorck|[l996[ ). However, when working with very 
large problems that are inherently well-posed, thanks to a good 
design of experiment and reduction model, our experience is that 
the method of normal equations is nearly always adequate in 
terms of accuracy, and then has the edge over other methods in 
terms of speed, storage and simplicity of the code. Moreover, in 
these problems, iterative improvement of the solution is usually 
required for other reasons (non-linearity, elimination of outliers), 
which partly compensates for the loss of precision when forming 
the normal equations. 

C.2. The Cholesky algorithm 



The standard Cholesky algorithm (e.g., Bjorck|[r996 Golub & 
van Loan 1996 1 requires that is positive definite, which is 
always the case for a well-conditioned least-squares problem. 
Given and b, the solution of the system Nx = b proceeds in 
three steps: 

CI. Use the Cholesky algorithm to find the unique upper- 
diagonal matrix U, with positive diagonal entries, such that 
N = U'U. 

C2. Solve the lower-triangular system U'y = b. 
C3. Solve the upper-triangular system Ux - y. 

The matrix U (or its transpose) is known as the Cholesky factor 
or square root of A^. As all the computations can be made in- 
place, we have symbolically 



ci 



C2 



C3 



[N\b]^[U\b]^[U\y]^[U\x] 



(C.l) 



The extension to an arbitrary number of right-hand sides (to 
the right of the vertical line) is trivial. For example, the in- 
verse A^"' can be obtained by inserting the identity matrix for 
b. We also note that CI and C2 are mathematically equivalent to 
pre-multiplying the matrix and right-hand side, respectively, by 
([/')'■ Performing C2 on the unit vector e,- - [0,0, . . . , 1, . . . ,0]' 
(with 1 in position /) thus produces e, - (?/') such that 
e'jCj - e'.U'HU'Y^e i - e'-N'^Cj - {N'^)ij. Selected elements 
or sub-matrices of A^"' can thus be obtained without computing 
the full matrix. 



Algorithm C.l Returns upper-triangular U such that U'U = A^, 
where A^ € R"^" is symmetric and positive semidefinite. Also 
computes an estimate d of the rank defect of A^. 

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12: 
13: 
14: 
15: 
16: 
17 
18 
19 
20 
21 
22: 



for 7 = to n - 1 do 
for ! = to j do 

if I < j then 
if Uii > then 

else 

end if 
else 

if U,, 

else 

end if 
end if 
end for 

for ;■ = 7 + 1 to ?! - 1 do 

end for 
end for 



- 

i- > then 

- 

- 0, d^d+\ 



Algoritlim C.2 Returns y such that U'y 
upper-triangular and ^ G R". 



b. where U e 



for z = to ;i - 1 do 
if Uii > then 

>'/ ^ 0'/ 
else 

end if 
end for 



21=0 Ukiyk)IUii 



The above three steps are accomplished by Algorithms |C.lf 



C.3 for arbitrary symmetric positive definite A^. (Actually, these 



algorithms include the non-standard modification discussed be- 
low in order to handle semidefinite matrices gracefully.) A few 
remarks should be made concerning its practical implementa- 
tion. First, the matrices U and A^ in CI, and b in C2, and x 
and y in C3 can share the same memory if it is acceptable that U 
overwrites A^, etc (in-place calculation). Second, since A^ is sym- 
metric, only the upper-diagonal part of it {Nij for / < j) is used 
in CI, and similarly for U. For large systems one can therefore 
save roughly half the memory by storing only the upper-diagonal 
parts of A^ and U, e.g., as one-dimensional arrays. The code in 
lines [T9|j2T| of Algorithm |C.1| is then irrelevant. Third, if A^ is a 
'skyline matrix' with envelope Ej (i.e., Nij - for / < Ej) then 
U has the same envelope: the Cholesky decomposition gives no 
fill-in above the envelope. This allows to store and decompose 
certain sparse matrices very efficiently, such as the band matrix 
inFig.|5] 

C.3. Application to semidefinite systems 

In several of our applications the normal matrix is however 
not positive definite, either from a lack of observations (e.g., 
data gaps in the attitude spline representation) or by design 
(e.g., for the calibration parameters). Application of the stan- 
dard Cholesky algorithm in such cases results in an exception 
which may be non-trivial to handle (for example, by changing 
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Algorithm C.3 Returns x such that Ux = y, where U e R"^" is 
upper-triangular and y eW. 



The numerical 
Algorithm 



C.l 



for ! = ;i - 1 to do 
if Uii > tiien 

else 

X, ^ 
end if 
end for 



the knot sequence of the attitude spline) and which may leave the 
partially solved system in an undefined state. However, that the 
Cholesky algorithm fails does not mean that there is no solution 
to the normal equations: on the contrary, there is an infinitude of 
solutions and the problem is rather which one to pick. Indeed, in 
many situations we could in principle accept any valid solution; 
for example, when the null space of the problem is known a pri- 
ori, and we are prepared to handle the associated non-uniqueness 
of the solution (cf. Sect. |6.1| l. For these and several other rea- 
sons it is advantageous if the computation can be continued in 
some sensible way, while of course noting the detected rank de- 
ficiency. 

A number of methods are available to handle rank- 
deficient or ill-conditioned least-squares problems. Singular 
Value Decomposition (SVD; e.g. 



Bjorck 1996 



Golub & van 



|Loan| [T996i [Press et "aL] |2007| l is the method often recom- 
mended; it provides the unique 'pseudo-solution' with the small- 
est Euclidean norm. However, SVD is computationally expen- 
sive and the pseudo-solution is not necessarily better than any 
other valid solution to the singular least-squares problem. 

By construction, the normal matrix - A' A is positive 
semidefinite: x'Nx - \\Ax\\^ > for any x (cf footnote [tJi. A 
modification of the standard Cholesky algorithm allows the de- 
composition in CI to be made also for such matrices, although 
U is no longer unique; similarly C2 and C3 can be modified to 
produce a valid (if non-unique) solution to the normal equations 
Nx - b. For example, the UNPACK routine xCHDC ( |Dongarra| 



et al.||1979) implements a robust version of the Cholesky algo- 



rithm, using complete pivoting (i.e., a simultaneous permutation 
of the rows and columns of A^) to generate the unique square root 
with non-zero elements only in the first r rows, if r < n is the 
rank of the matrix (for a detailed analysis, see Higham 2002). 
Other modifications of the Cholesky algorithm (e.g., ,Schnabel| 
& Eskow|1999 1 also uses pivoting. 



Permuting the rows and/or columns of the matrix is highly 
undesirable in the present applications. For example, when ap- 
pUed to a band matrix (such as Fig. |5]l it likely destroys the band 
structure, and in general prevents the envelope-based storing of 
the sparse matrices and U outlined above. While pivoting is 
never needed for the Cholesky factorization of a positive defi- 
nite matrix, it is thought to be an essential ingredient in mod- 
ified algorithms aimed at more general symmetric matrices. A 
simple modification of the Cholesky algorithm, which makes 
it applicable to the semidefinite case without pivoting was de- 
scribed by Lawso n & Hanson] ( |1974[ Eq. 19.5); see also Golub 



& van Loan^(1996 Eq. 4.2.1 1), who however warn that "it may 



be preferable to incorporate pivoting". Nevertheless we have im- 
plemented the corresponding modifications in Algorithms |C. If 



C.3 without pivoting, and find that they work quite well in our 



accuracy of the decomposition in 
was tested in MATLAB for a range of 
rank-deficient random positive semidefinite matrices 
(patterned after Higham 1990) by computing the quantity 
Pn - \\N - t7'i7||F/(M||A^||F) as a measure of the rela- 
tive error in units of the floating point precision. Here 
[trace(A^'A^)]''^ is the Frobeniu s norm of A^, and 



IIA^IIf 

M = 2 is the unit roundoff (Golub & van Loan 1996) of the 



double precision floating point arithmetic used. We find that 
the present algorithm, without pivoting, performs almost as 
well as LINPACK's xCHDC with complete pivoting, as judged 
from the statistics of our pf^ compared with the corresponding 
quantity pk reported by Higham. However, C.l is much less 
useful as a rank-revealing algorithm - the estimated rank defect 
is often much too small. 

Similarly, in order to check the numerical validity of the so- 
lution to the rank-deficient normal equations computed by C. 1- 
C.3, we made the following experiments, using the same matri- 
ces as above. For random vectors x we first computed b - Nx, 
then used C. 1-C.3 to recover a solution x (usually very dififerent 
from x). Finally we computed pt, - \\b - A^JE||F/(M||ft||F). We find 
that our algorithm performs almost as well as MATLAB's ba- 
sic solution xtilde = N\b, and much better than the minimum 
norm solution xtilde = pinv(N)*b (with default tolerance). 
For example, the 99th percentile of pi, was ~ 10'* for C.1-C.3, 
-10^ for MATLAB's backslash (\) operator, and ~10" (!) when 
using pinv. 

We conclude from these limited experiments that the present 
version of the Cholesky algorithm is a useful, simple and effi- 
cient substitute for much more sophisticated algorithms appli- 
cable in the semidefinite case. Since it does not use pivoting, 
it preserves the envelope of the matrix and is therefore espe- 
cially suited for banded matrices and envelope-based sparse ma- 
trix methods. It provides an estimate of the rank of the matrix, 
which however is rather unreliable. In the positive definite case 
the algorithm is equivalent to the standard Cholesky method. 

Appendix D: Complexities beyond tlie basic 
modelling 

Section[3]describes a set of baseline models for the sources, atti- 
tude, and geometrical instrument, which are believed to be real- 
istic enough to serve as an acceptable first-order approximation 
of the actual data for primary sources. Due to the many complex- 
ities of the real satellite and its operation, as well as the physical 
environment in space, there are however many additional effects 
that may affect the astrometric results at the /ias level, and which 
have to be considered in the final modelling. In this Appendix 
we discuss some of the known effects that will be addressed in 
future versions of the astrometric solution. 



D.1. Chromaticity 

Although the Gaia telescopes are all-reflective, with no refrac- 
tive elements in the optical paths to the astrometric field, they 
are nevertheless not completely achromatic. In the presence of 
odd wavefront errors, such as coma, the centroids of the opti- 
cal images do in fact depend on the wavelength, and hence on 
the spectral energy distributions of the observed sources For 
the typical wavefront errors expected in the astrometric field of 



applications. Algorithm C.l includes a rough estimation of the 
rank defect d - n - r. 



'Centroid' should here not be understood as the centre of gravity 
of the optical image; rather, it is a non-trivial function of the light dis- 
tribution, similar to the estimation of the image location k in Sect. |3.5| 
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Gaia (about 50 run RMS), the AL centroid shift from an early- 
type star to a late spectral type could amount to several mas. 
This systematic effect, known as chromaticity, can therefore be 
many times larger than the photon-statistical uncertainty of the 
estimated image location (cf. Table[T]l. It is thus essential to have 
a very good calibration of the chromaticity, for which it is neces- 
sary to know the spectral energy distribution of every observed 
source. This is obtained from the photometric observations in the 
BP and RP fields (see Fig. |3]l. 

Chromaticity is eliminated in the CCD signal analysis by us- 
ing a Line Spread Function (LSF), L{x) in Eq. ( [23] l, which is cor- 
respondingly shifted depending on the (known) spectrum of the 
source. The resulting AL pixel coordinate k is therefore in prin- 
ciple achromatic, and the effect need not be further considered 
in the astrometric solution. However, as mentioned in Sect. |3.4| 
it is envisaged to have diagnostic colour-dependent terms in the 
geometric instrument model of the astrometric solution. These 
calibration parameters should obtain negligible values in the so- 
lution if the chromaticity has been correctly accounted for in the 
calibration of L(x). Conversely, non-zero values can be used to 
improve the LSF calibration in the next processing cycle. 



D.2. Charge Transfer Inefficiency of the CCDs 



The CCD signal model in Eq. ( |23) assumes a perfectly linear 
detector, which is not exactly the case for the real detectors and 
especially not in the presence of radiation damage on the CCDs. 
Traps in the silicon substrate, produced by particle radiation in 
the space environment, will capture charges during the TDI oper- 
ation of the CCDs, and release them with delays ranging from a 
fraction of the TDI period to minutes. The charge capture and re- 
lease processes introduce a number of phenomena, collectively 
referred to as Charge Transfer Inefficiency (CTI = 1 - CTE, 



where CTE is the Charge Transfer Efficiency; [Janesick 2001 



CTI will affect all kinds of observations in Gaia (astrometric, 
photometric, spectroscopic). The most important phenomena for 
the astrometric observations are the apparent charge loss (be- 
cause part of the charges are released outside the observed win- 
dow) and centroid shift (because some charges are released with 
a delay of one or a few TDI periods). These and more general ef- 
fects of the radiation damage are the subject of extensive theoret- 
ical and experimental studies within the Gaia community (e.g., 
Seabroke et al 2009; Prod'Ho mme et al.|[20T0l [Prod'homme' 
et al.,_201 1) . The adopted method to handle these effects in the 
Gaia data processing is by means of forward modelhng using a 
so-called Charge Distortion Model (CDM; |Short et al.|20 10). In 
the context of the CCD signal model of Sect.|3.5| the CDM may 
be represented by the (non-linear) operator D: 



(D.l) 



where A^^ is the signal model at sample k in the absence of 
radiation-damage effects, e.g., according to Eq. ( [23] l. The vari- 
able represents the state of the CCD, e.g., in terms of how 
much radiation damage it has suffered. Note that the expected 
value of sample k depends on the CCD illumination history up 
to and including sample k, which is expressed by the CDM tak- 
ing as input the (undamaged) value not only for sample k but 
also for the preceding samples (k' < A:)|^One of the methods 



For the sake of illustration, the centroid could for example be the point 
obtained by fitting a Gaussian PSF to the image. 

A further complication is that most AF observations are binned in 
the AC direction before read-out, while the CTI effects operate indepen- 



employed for mitigating CTI effects in Gaia is through a peri- 
odic (e.g., once per second) electronic injection of charges in a 
few consecutive TDI lines. As the lines of charges travel across 
the CCD, most of the harmful traps are temporarily filled, thus 
reducing the CTI of subsequent charge transfers (Laborie et al. 



2007 ). The method has the additional benefit of periodically re- 
setting the illumination history of the pixels, so that in Eq. ( |D.l| l 
only the samples since the previous charge injection need to be 
considered ( [Short et al.|2010| l. 

The CDM depends on a moderate number of parameters that 
will be estimated in parallel with the LSF (or PSF) calibration 
prior to the astrometric solution (the 'Instrument response pa- 
rameters' in Fig.[T]). In principle, the subsequently estimated im- 
age location k should then not only be achromatic, as discussed 
in Appendix [D.l I but also free of CTI effects, so that the astro- 
metric solution can use a purely geometric instrument model, 
as required by the primary source model. Although this is ob- 
viously a highly idealised condition, it it nevertheless what the 
final data analysis must aim to achieve. 

For the simple image of a primary source, the centroid shift 
due to the CTI depends mainly on the magnitude of the source, 
the time since the previous charge injection, and the accumu- 



lated radiation dose experienced by the CCD ( Prod'homme et al 
201 1[ ). It is expected that imperfections in the CDM caUbration 



will likewise show up in systematic shifts depending primarily 
on these (known) quantities, and can be represented by a set of 
diagnostic calibration functions in the generic calibration model 



of Sect. 3.4 Non-negligible values of the diagnostic parameters 
in the astrometric solution indicate that the CDM is not doing its 
job properly, and they can then be used to improve the model. 
The reader is referred to the cited papers for quantitative infor- 
mation on the expected level of CTI effects in Gaia data, the ef- 
fectiveness of different mitigation strategies, and the associated 
performance degradations. 

D.3. Effects of the finite CCD integration time 

Up until now we have regarded the astrometric observations 
of Gaia as instantaneous measurements of the crossings of the 
source images over the fiducial 'observation line' (Fig. |4]| at the 
centre of the CCD (or of the gated portion of the CCD) in the 
AL direction. In reality, due to the finite integration time (T) of 
the CCD observations, any measurement clocked into the CCD 
readout register at time t actually depends on the average attitude 
and source position over the preceding integration time interval, 
[t - T,t]. The time delay is, to first order, taken into account by 
associating the measurement with the observation time t - T/2 
(Sect. 3.5 I. Following Bastian & Biermann ( 2005 | l we should, 
more precisely, assume that the observed location k of the image 
centre in the pixel stream is a weighted mean of the instanta- 
neous location A:,(f) of the optical image relative to the charge 
image during the preceding integration interval: 



e(T)K4t -T)dT 
/;.(r)dr 



(D.2) 



where e(T) is the (nominally flat) 'exposure function' for look- 
back time T, i.e., the rate at which electrons are produced, and 



dently on each pixel column. Thus the CDM should ideally be applied to 
the two-dimensional charge image, and the distorted charge image then 
binned for comparison with the one-dimensional data. This requires the 
use of a PSF replacing the LSF in Eq. 1 23 i. 
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transported to the read-out register, for constant illumination. 
The instantaneous location is given by 



/C.(f) = S [77,(0 - T]f„gj + k{t) , 



(D.3) 



where s is the local scale factor (pixels per radian), 77, (f) the in- 
stantaneous AL field angle of the optical image centre, and rifg„ 
the AL coordinate of the fiducial observation line for the appro- 
priate AC coordinate, field of view, etc. The function k(t) is the 
inverse of f^, relating the time coordinate to the TDI period in- 
dex k. For the present discussion k(t) is regarded as a continuous 
function, ignoring the step- wise transportation of the charge im- 
age in TDI modePj 

Recalling that 77,(0 is decreasing with time (cf Fig.[3]l, while 
k(t) is increasing, it is seen that a:,(0 remains approximately con- 
stant throughout the integration. Let us denote by the exact 
time when the centre of the optical image crosses the fiducial 
observation line, so that 77,(fc) - Vfng, and let = k(tc) be the 
corresponding pixel coordinate. If the speed of the optical image 
exactly matches the speed of the charge image, or sfj^ + k - 0, 
it is seen that a:,(0 is indeed constant and equal to k^^. Let us 
now consider what happens if there is a mismatch between the 
speeds. This could be caused by (i) a deviation in the local scale 
value s due to optical distortion; (ii) a non-nominal local scan 
rate; and (iii) that the object itself has significant motion (e.g., 
an asteroid). Assuming that k is constant and adopting a Taylor 
series expansion for the AL field angle over the exposure time, 
we have 



kit) - Kc + (t - t^)k , 

1 9 

ri*{t) = J]f„g + (t- Qi], + ijit- Q% + ■ 



(D.4) 
(D.5) 



Inserting in Eq. (D.3 1, and assuming that s is constant across the 
section of the CCD in question, we obtain by means of Eq. ( |D.2| i 

1 



K-Kc + (sTjt + k)ei + -577,62 + 



where 



f g(r)(r/2-r)'"dT 
X%(r)dr 



7M = 1, 2, 



(D.6) 



(D.7) 



are the normalized moments of e(T) relative to the exposure mid- 
time at T = T 12. For a constant and matching image speed we 
recover = /Cc as expected. In the general case of imperfect 
speed matching and non-constant scan rate there is a difference 
which should be taken into account in the astrometric solution. 
If we assume that s is known, the speed mismatch i?), + k can 
be computed for every observation, and the factor e\ can then 
be estimated as an instrument calibration parameter using the 
generic calibration model in Sect. 3.4 e\ will depend (at least) 



on the CCD and gate used; but due to the accumulating radiation 
damage it is likely to evolve with time and could possibly have a 
magnitude-dependent component as well. In the next (quadratic) 
term we may know 77, (from the attitude) and 62 - 112 (for 
constant exposure function) to sufficient accuracy that it might 



Bastian & Biermann 



1 2005 I is 



The corresponding expression in 
their Eq. (6), in which k{t) is the (integer) index of the last TDI clock 
stroke before time t. Thus their K,{t) oscillates with an amplitude of 
±0.5 unit for every TDI period. The continuous approximation adopted 
in our Eq. |D3| is acceptable since we are always considering integrals 
covering a whole number of oscillations. 



be applied as a correction to the observed k. However, since most 
observations are ungated (giving maximum T), it may be bet- 
ter to adopt the uncorrected k for this maximum T as defining 
the effective attitude]^ and only correct the gated observations 
for the difference in ^2; hence they, too, will refer to the effec- 



tive attitude. A complication is that ?), in Eq. (D.61 should be 



computed from (unknown) physical attitude, but to first order 
it can be obtained from the effective attitude. Attitude irregu- 
larities on time scales shorter than T add further complications 
(see Appendix |D.4| i, but it is unlikely that higher-order terms in 
Eq. (D.6 1 can profitably be accounted for. 



D.4. Attitude irregularities 



The basic attitude model described in Sect. 3.3 uses a spline rep- 
resentation which is (normally) continuous in the angles spec- 
ifying the instantaneous orientation of the instrument, as well 
as in the first M - 2 derivatives of the angles, where M is the 
order of the spline (typically cubic splines are used, for which 
M - 4). The actual (physical) attitude is much more complex 
and in particular there may be discontinuities and irregularities 
on time scales that are too short to be adequately represented 
by a spline with the knot separations considered in the basic 
model. Low-frequency perturbations (< 0.01 Hz) are of no con- 
cern here, as they can be perfectly represented by splines. The 
most important contributors to perturbations at higher frequen- 
cies are thruster noise in the micro-propulsion system used for 
the attitude control; the discrete and partially stochastic nature 
of the control system (for example that the commanded thrusts 
are updated once per second); micrometeoroid impacts on the 
satellite; and various dynamical effects such as fuel sloshing and 
structural vibrations. 

The high-frequency perturbations due to the thruster noise 
and control system generate angular jitter of the physical attitude 
that has a significant amplitude relative to the astrometric accu- 
racies ultimately aimed at, but still small in comparison with the 
AL pixel size (^ 59 mas). Thanks to the TDI integration this 
high-frequency jitter is largely removed from the effective atti- 
tude by the averaging over the exposure time T. As a result, the 
shortest knot interval needed to accurately represent the effec- 
tive attitude is also of the order of the exposure time, or about 
5 s. The optimum knot interval may be longer, depending on the 
number and magnitudes of the primary sources available for the 
attitude determination, and on the actual level of perturbations. 

The expected frequency of micrometeoroid impacts of var- 
ious sizes can be predicted from the known velocity and mass 
spectrum of interplanetary particles. Each impact produces a 
quasi-instantaneous change in the angular velocity of the satel- 
lite, while the attitude angles are continuous across the impact 
time. It is expected that a few hundred impacts will occur ev- 
ery year producing a change in the AL angular velocity exceed- 
ing 1 mas s"' (which should be easily detectable), with the fre- 
quency roughly inversely proportional to the minimum change 
considered. Discontinuities in the physical attitude rate can be 
represented in the spline model by inserting multiple knots at the 
estimated times of impact, (Sect. 5.2.6 1. However, the effective 



attitude will instead see a linear change in the attitude rates over 
an interval equal to the exposure time T, centred on fi, which re- 
quires that multiple knots are inserted both at ti-T/2 and ti+T/2. 
(This treatment becomes more problematic in connection with 



The effective attitude is then the physical attitude convolved with 
the (average) exposure function for maximum T. It corresponds closely 
to the 'astrometric attitude' introduced by jBastian & Biermann ( 2005^ . 
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gated observations, when T is non-nominal.) Impacts will be de- 
tected by inspecting the observation residuals in connection with 
the attitude updating (Sect. 5.2.5 1. 

The detailed re-examination of the Hipparcos attitude by |van| 
[Leeuwen] ( |2005 [ [2007| l revealed numerous discontinuities of the 
along-scan attitude angle (scan phase) of several tens of mas. A 
large fraction of them could be linked to the beginning or end 
of eclipses experienced by Hipparcos in its highly elliptic orbit 
around the Earth. A likely cause is thermal re-adjustment of the 
solar-panel hinges, following a sudden change in temperature. 
As there is no change in the net angular momentum, but only 
a re-distribution of inertia, the attitude rates are the same be- 
fore and after a discontinuity. For Gaia it is estimated that such 
'clanks' will be negligible, but the attitude processing should 
nevertheless be capable of identifying instances, should they oc- 
cur, and to take appropriate measures. Due to the finite CCD in- 
tegration time, the apparent effects of a clank will be two discon- 
tinuities in the attitude rates, equal but of opposite sign, and sep- 
arated by the integration time T . Again, this can be handled by 
suitable modification of the knot sequence of the attitude spline. 
Like the micrometeoroid impacts, clanks will be detected during 
the attitude updating by means of the characteristic patterns that 
they generate in the observation residuals versus time. 

Appendix E: Tables of acronyms and variables 



Table E.l. List of acronyms 



Acronym Description 



Table |E.l] is a list of acronyms used in the paper. Table |E.2] lists 
the most important variables, with a short description and a ref- 
erence to where it is introduced or explained. 



AC 

ACF 

ACP 

AF 

AGIS 

AGN 

AL 

ASI 

BAM 

BCRS 

BP 

CCD 

CDM 

CPS 

CG 

CPU 

CoMRS 

CTE 

CTI 

CU2 

CU3 

DPAC 

EADS 

ESA 

ESAC 

FFoV 

EPA 

EoV 

GCRS 

HEALPix 

lAU 

ICRS 

LSF 

NSL 

PFoV 

PPN 

PSF 

RMS 

RP 

RSE 

RVS 

SI 

SM 

SRS 

SVD 

TB 

TCB 

TDI 

VLBI 

WPS 

XML 



ACross scan direction (Pig.pb 
ACross scan coordinate in tne Following FoV 
ACross scan coordinate in the Preceding EoV 
Astrometric Eield 

Astrometric Global Iterative Solution 

Active Galactic Nucleus 

ALong scan direction (Eig.jsjl 

Accelerated Simple Iteration 

Basic-Angle Monitor (Fig.|3j 

Barycentric Celestial Reference System 

Blue Photometer (Fig.jsjl 

Charge-Coupled Device 

Charge Distortion Model 

Calibration Eaint Star 

Conjugate Gradient 

Central Processing Unit 

Centre of Mass Reference System 

Charge Transfer Efficiency (of a CCD) 

Charge Transfer Inefficiency (of a CCD) 

DPAC Coordination Unit 2, 'Data Simulations' 

DPAC Coordination Unit 3, 'Core Processing' 

Data Processing and Analysis Consortium 

European Aeronautic Defence and Space company 

European Space Agency 

European Space Astronomy Centre 

Following Field of View (Fig.|2jl 

Focal Plane Assembly (Fig.pb 

Field of View 

Geocentric Celestial Reference System 

Hierarchical Equal- Area iso-Latitude Pixelisation 

International Astronomical Union 

International Celestial Reference System 

Line Spread Function 

Nominal Scanning Law 

Preceding Field of View (Fig.S 

Parametrised Post-Newtonian (relativity formalism) 

Point Spread Function 

Root-Mean-Square 

Red Photometer 

Robust Scatter Estimate (footnotellSl 
Radial Velocity Spectrometer (Fig.p| 
Simple Iteration 
Sky Mapper 

Scanning Reference System (Fig.[2]l 
Singular Value Decomposition 
TeraByte 

Barycentric Coordinate Time 

Time-Delayed Integration (CCD operation mode) 

Very Long Baseline Interferometry 

WaveFront Sensor (Fig.jsjl 

extensible Markup Language 
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Var. 



Description 



Ref. 



/f 



A 



a„ 

Bnit) 

B 
b 

bait) 
C 

c 

c 
D 
d 

E 

e(t) 
e 

f 
f 

h, 
G 

8 
8 
h 
I 

i 

i 

K 
K 

k 
k 
k 
L 

K 
I 
t 

M 
N 
N 
n 
n 
n 
n 
Pi 
Q 

q 

r 

n 
Ri 
S 
s 
s 

Si 

T 

tk 
ti 



the astronomical unit 
attitude matrix 

galactocentric acceleration vector 
attitude parameters 
quaternion coefficient for B„{f) 
B-spline function 
iteration matrix 

right-hand side of normal equations 

barycentric coordinate of Gaia at time t 

celestial reference system 

constraint matrix for the calibration parameters 

calibration parameters 

non-linear Charge Distortion Model 

update vector 

expectation operator 

CCD exposure function 

error vector 

field index (±1 for preceding/following) 

detector coordinates 

preceding, following viewing direction 

Gaia broadband magnitude 

gate index 

global parameters 

auxiliary data, e.g., ephemerides 

the identity matrix 

subscript denoting some source 

'short' calibration time interval index 

no. of TDI periods for CCD integration 

preconditioner matrix 

TDI index in CCD pixel stream 

'long' calibration time interval index 

iteration index 

Line Spread Function 

shifted Legendre polynomial of degree r 

subscript denoting some observation 

left index in a knot sequence 

spline order - M - A for cubic spline 

number of degrees of freedom for a spline 

normal equations matrix 

attitude parameter index 

CCD index 

dimension of the normal matrix 
nuisance parameters 
normal-triad component 
objective function to be minimized 
attitude quaternion 
normal-triad component 

degree of the small-scale calibration polynomial 

normal-triad component 

residual of observation / 

Scanning Reference System 

local scale factor 

astrometric ('source') parameters 

the astrometric parameters for source / 

light integration (exposure) time on CCD 

time for sample k 

time for observation / 

reference epoch for astrometric parameters 

reference epoch for frame rotator 

reference epoch for non-rotating ICRS sources 



Sect 
Eq. (|? 
Sect " 
Sect 



.1.4 



33 



Eq. (3Ur 
Eq. 15 

Sect 



53 



Eq. {72' 
SectTO 
Eq. 



App endix 
Sect. 143 



Eq. (Or 
Eq. (fn 



Eq. (jrr 

Sect 



Sect 4^ 
Sect. TI 



Br 



33 



App. F 
Eq. (I3D 



Eq. (TU 



Eq. (T^i 



Eq. (35 
Eq. (Ti 
Eq. (5l 



Eq. (24 



Eq. (TU 



315 



03 



3.6 



33 



33 



33 



33 



33 



33 



53 
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Table E.2. List of mathematical variables (continued) 



Var. 



Description 



Ref. 



6.1.4 



T7 



r6 



XT 



73 1 



Hi 



fpM reference epoch for moving ICRS sources Sect 

U relegation factor for primary-star selection Sect 

Mj proper direction to source / Sect 

Ui geometric direction to source / Sect 

V nullspace of the normal matrix Sect 
Wi weight matrix for source / Eq. (|57 
Wi downweighting factor for observation I Sect 
Wi statistical weight of observation / Sect 
X differential parameter vector Sect 
[X Y Z] celestial reference system (ICRS or CoMRS) Sect 
[x y z] Scanning Reference System (SRS) Fig 
a flux (image parameter) Eq. 
a, right ascension of source / at fgp Sect!" 
yS background level (image parameter) Eq 
Tc (conventional) basic angle Eq 
y PPN curvature parameter Eq 
di decUnation of source / at fgp Sect 
Ca excess attitude noise Eq 
Ci excess source noise (for source /) Eq 
6; excess noise in observation / Eq. ^Z^ 
e orientation correction (frame rotator) Sect 
^ across-scan (AC) field angle Eq. (|rZ' 
77 along-scan (AL) field angle Eq 
K along-scan (AL) pixel coordinate (image location) Eq 
/l regularization parameter for the attitude update Eq 
Ak intensity for CCD sample values Eq 
yu across-scan (AC) pixel coordinate (image location) Sect 
^ati proper motion in a (x cos 5,) for source ; Sect 
Hsi proper motion in S for source / Sect 
Uri radial proper motion for source / Sect 
Hq proper motion due to galactocentric acceleration Sect 

V number of degrees of freedom Sect 
'CJi parallax for source / Sect 
Q normalized RSE error of astrometric parameters Sect 
CTi formal standard uncertainty for observation I Sect 
T„ attitude spline knot n Sect 
(p along-scan instrument angle Eq. ( 
CO spin correction (frame rotator) Sect 



(TT 



(21 



T7 



T7 



171 



6.1 


4 


b.l 


2 



Tl 



